Migrating our SaaS off AWS along with 8k Data Scientists

Hopsworks is an open platform for developing and operating AI systems at scale that can be deployed on any Kubernetes clusters from public clouds to sovereign air-gapped data centers. Hopsworks can be considered an alternative to MLOps platforms such as AWS Sagemaker, GCP Vertex, and Databricks (for AI), but it has higher performance real-time AI, better Python integration to the Lakehouse, as shown in our peer-reviewed SIGMOD’24 research paper, and is the OG feature store for ML.

Hopsworks provides both data and compute support, see Figure 1. There is a Lakehouse layer (Delta Lake, Apache Hudi, Iceberg coming soon) to store large amounts of historical feature data for training AI models and for batch inference. There is also a low latency database, RonDB, that we develop for real-time AI workloads (including specialized query support for low-latency query support and snowflake schema data models). Hopsworks also supports compute (Jobs, Notebooks) on Kubernetes, supporting Python, Spark, and Ray and GPU sharing/optimization at scale. However, you can also provide your own compute, in which case you use Hopsworks primarily as a data layer to integrate your AI pipelines. For this, Hopsworks also provides a model registry and supports deploying models on KServe/vLLM.

***Figure 1.*** The Hopsworks stack deploys on a Kubernetes Cluster and requires a S3 storage bucket. Hopsworks also has several internal services that can be replaced by external services, including a Container Registry.

Hopsworks serverless is a freemium version of Hopsworks where you can store up to 50GB of Lakehouse data, 100MB of low-latency feature data in RonDB, up to 100 models in the model registry, and serve 2 model deployments. Hopsworks freemium is widely used to run serverless AI applications.

Hopsworks serverless, in effect, provides mostly free storage to users, but no free compute, which kept our hosting costs down to $8k/month on AWS. We could scale from 8k to 20k users without much additional cost (an extra TB of s3 storage only costs around $25/month). However, egress costs on AWS were a significant risk. In 2024, we released the Hopsworks Query Service, which provides high throughput read access to feature data from Python clients (using Arrow and DuckDB). Suddenly, clients could read 100s of MB or even GBs easily into Pandas DataFrames. We looked nervously at the growth in data egress and projected cost increases and decided to act. We knew OVH, given our shared European roots, and found out they provided as managed services all we needed to run Hopsworks - managed Kubernetes, a managed container registry, and S3 compatible object storage. Installing Hopsworks on OVH with Helm charts worked great, so we decided to migrate Hopsworks Serverless to OVHCloud - but we kept it in North America where most of our users are to maintain existing latency to those users.

Moving from AWS Services and to OVHCloud Services

We really only had Kubernetes and S3 as dependencies. Hopsworks has its own observability stack, based on OpenSearch and OpenSearch Dashboards, as well as its own metrics stack, based on Prometheus/Graphana. We have always been careful to not use cloud-specific services in Hopsworks. Here is a comparison of the main services we considered.

Managed Kubernetes

Both AWS and OVH offer managed K8s services. AWS wins on maturity, but OVH wins on pricing (free!).

Network Egress

OVH has long been known for not charging for public egress, while AWS is infamous for its predatory public egress costs. Cloud egress costs are documented on this site. OVH charges for egress at some (newer?) regions, albeit at 1/8th the cost of AWS. Many cloud regions do not charge for egress today (March 2025).

S3 Storage

AWS S3 is the premier cloud-based object storage service. It may have higher availability, but it’s three times more expensive than OVH’s S3 storage.

Container Registry

AWS’ Elastic Container Registry (ECR) is a mature, configurable, scalable managed service. OVH’s Harbor is available at only 3 different sizes for a fixed monthly cost, providing potential scalability concerns for those with >5 TB storage requirements or high concurrency requirements.

Inter Availability Zone Data Transfers

Hopsworks is tolerant to Availability Zone failures by replicated services over instances in different availability zones. This results in network traffic between instances on different availability zones.

EBS Instances

Some Hopsworks services require persistent volumes that are provided by AWS Elastic Block Storage and OVH Block Storage (implemented using Ceph). Hopsworks also has instances that use local (NVMe) disks. OVH has higher throughput NVMe disks available for lower storage capacities (1-4 TBs) compared to AWS. We use these instances for our database, RonDB, but they are not included in the table below.

The Actual Migration

We notified our users of a maintenance window of 24 hours, on November 26th 2024. We backed up the Hopsworks cluster to a AWS S3 bucket on that day and then migrated that bucket to a S3 bucket in OVHCloud. This made migration with some downtime relatively painless. The Hopsworks cluster on OVH was deployed with helm charts, and had a testing process that we followed before re-opening Hopsworks for logins. No users contacted us about anything untoward in their accounts as a result of the migration.

Summary

In Q4 2024, we completed the migration from AWS, seamlessly transitioning thousands of users to a resilient Kubernetes-based infrastructure on OVHCloud. Although OVH and Hopsworks are technologies built in Europe, Hopsworks serverless service is located in North America, where most Hopsworks users are located and OVH also provides cloud capacity. In Europe, Hopsworks and OVH have since become partners to provide a sovereign AI platform for developing and operating AI systems at scale. We like OVH’s simple pricing and lower pricing - not just network egress but most services are lower cost and the quality is generally good.

Migrating our SaaS off AWS along with 8k Data Scientists

TL;DR

Hopsworks

Moving from AWS Services and to OVHCloud Services

Managed Kubernetes

Network Egress

S3 Storage

Container Registry

Inter Availability Zone Data Transfers

EBS Instances

The Actual Migration

Summary

References

Interested for more?

More blogs

Show me the code; how we linked notebooks to features

Feature Engineering for Categorical Features with Pandas

MLOps vs. DevOps: Best Practices, Challenges and Differences