Hopsworks' Kubernetes migration marks a significant milestone in the company's evolution. By adopting this container orchestration platform, we aim to enhance our infrastructure's scalability, reliability, testability, and portability. This move not only streamlines our operations but also positions us to better serve our users with improved performance and resource management.
Nearly a year ago, the Hopsworks team embarked on a journey to migrate its infrastructure to Kubernetes (k8s). October 6th marked the first anniversary of our migration repository's creation. Our primary goal for this transition was to enhance the scalability, reliability, testability, and portability of Hopsworks' components. Given Hopsworks' extensive range of services, container technologies like docker-compose proved inadequate mostly due performance. This is largely because Hopsworks 4.0 runs 50+ containers concurrently, excluding those that can be dynamically created by the Hopsworks platform.
By leveraging Kubernetes' container orchestration capabilities, we aimed to streamline our deployment processes, enhance resource utilization, and provide a more robust platform for our users. This move improves costs for us and, more importantly, for our customers. Moreover, one of the key advantages of moving Hopsworks to Kubernetes is its self-healing capabilities. Kubernetes automatically monitors the health of containers and can restart, reschedule, or replace them if they fail or become unresponsive. This ensures higher availability and reliability of our services, reducing downtime and manual intervention.
Moreover, by abstracting away the complexities of networking, Kubernetes allows our developers to focus on building and scaling Hopsworks features without worrying about the underlying infrastructure. In this article we describe three main pillars of our Kubernetes migration.
We started the journey by containerizing each one of the Hopsworks services. All our images are repository-based and are built whenever new changes are merged into the main branch of these repositories. The orchestration of these images is managed by our "migration" mono-repo. In practice, this repository is a Helm chart (or package) for the entire Hopsworks deployment.
Helm is a package manager and template engine for Kubernetes that simplifies the process of defining, installing, and upgrading even the most complex Kubernetes applications. With Helm, we can install and deploy Hopsworks in a Kubernetes cluster using a simple helm install command. Concretely; Helm allows us to 1) better manage the complexity of a Hopsworks installation as a single unit, 2) easily update and roll back deployments, 3) share our Hopsworks deployment configuration across different environments and, 4) customize deployments through the use of different values files which also improves testability.
The Hopsworks chart consists of one subchart per service required by Hopsworks. In short, each subchart is a composition of Kubernetes resources: deployments, statefulsets, services, ingresses, and autoscalers. Deployments and Statefulsets are composed of pods—the basic unit of work in Kubernetes. Pods define groups of containers that run together on the same node, ultimately enabling the creation of components that collectively form Hopsworks.
One of the key advantages of Kubernetes is its built-in reconciliation loops, which reduce the need for custom in-house logic. These loops constantly monitor the system's desired state and make adjustments to ensure the actual state matches. For example, k8s can delete (evict) pods to recreate them in other less stressed nodes. This automation simplifies management tasks and enhances system reliability by minimizing human error and intervention. However, we found that using a fully declarative definition for the Hopsworks deployment wasn't ideal for our needs.
When we started the migration, we simply declared a bunch of services and expected that after some time, due to Kubernetes' reconciliation mechanisms, the deployment would come alive. That works, yet, this approach had a major drawback: making services fully declarative would cause Kubernetes to enter a crash loop of "not-ready → restart → not-ready → ready." Consider the case when you have dozens of containers that want to interact with each other, while some of them are not ready, it results in hours for the deployment to become steady.
When a container enters the crash loop, Kubernetes applies the exponential backoff delay mentioned in the Container restart policy. This mechanism prevents a faulty container from overwhelming the system with continuous failed start attempts. [Link]
The main issue with the mentioned loop was that we wasted precious seconds between pod/container restarts. In other words, if a pod is scheduled and ready to execute its containers, we should use this rather than wasting it due to an unready dependent service. Therefore, we implemented a more nuanced approach than just relying on the native k8s reconciliation loop. For instance, it is unreasonable to start certain services if our database service (RonDB) isn't up yet.
With our approaches, our deployment times have dramatically decreased from 2 hours to under 30 minutes after running a helm install. We achieved this by striking the optimal balance between imperatively setting service dependencies and maintaining a fully declarative cluster state. While Helm doesn't provide a mature orchestration mechanism, we concretely implemented several generic lock-wait mechanisms to help the k8s orchestration: initContainers, k8s leases and lastly configMaps as volumes.
Let us illustrate the case with a simplified example. In the snippet below, we use generic initContainers that serve as simple-as-possible-high-level "wait-before-continue" checks. In the end, it is a bash script that checks that the MYSQL service is up and running before continuing to the main containers.
Notice that the main containers still need to start up, but at this point, the pod is already scheduled and the resources for the containers are assigned. Thus, once the initContainers have successfully finished, the main containers of the pod will run in just a few seconds.
By helping the k8s orchestration with our implemented lock-wait mechanism, like the one described above, we have significantly enhanced our development process. The 30-minute deployment time now allows us to test every pull request (PR) in the Helm chart repository. When we add a new configuration, modify a service or create a new image, we thoroughly test the change. We spin up a new Kubernetes cluster, install the Hopsworks Helm chart, and verify the deployment's readiness. This approach drastically reduces bugs. Furthermore, using containers as individual units of work enables higher observability and hence faster issue detection when problems arise. Notably, we've integrated Hopsworks' infrastructure code into our company's testing pipeline.
The previously mentioned testing process is also applied to real cloud providers. The process of spinning up a Kubernetes cluster and testing a 30-minute deployment in a real cloud provider like OVH, our partner, is feasible. We call this the "1-euro test" since spinning up a cloud cluster with a decent number of machines costs us less than 1 euro—a small price compared to the benefits. In a sense, by doing this, we also certify our compatibility with different cloud providers.
We want to remark that we haven't abandoned our bare-metal roots. While Kubernetes serves as an abstraction layer, allowing us to reach more customers without requiring them to provide custom on-premise node configurations, we still offer Hopsworks installations on bare-metal setups. In fact, our current CI/CD builds a Kubernetes cluster from scratch, keeping our expertise in potential on-premise installations as well.
In the migration process, we keep fine-tuning our Helm chart components to optimize resource consumption by our services. We've integrated several autoscalers at the pod level. As a result, together with auto-scalable node pools, we only use the resources we need, when we need them.
In addition to basic CPU and RAM requests that trigger cluster scaling, we've integrated several custom metrics through different and self customizer prometheus exporters. For example, the snippet below demonstrates how we use custom metrics from our ArrowFlight Query Service deployment to scale up and down. This scaling is based on arrowflight_request_queue_gauge and arrowflight_queue_time_gauge, activating when the queue has more than 10 pending requests or the response time exceeds 30ms.
Overall, optimization of resource usage translates directly into cost savings for our customers. By leveraging Kubernetes' efficient resource allocation and our fine-tuned configurations, we've been able to significantly reduce the overall infrastructure costs compared to our previous virtual-machine-based setup. This not only makes our platform more cost-effective but also allows for greater flexibility in scaling resources up or down based on actual usage patterns.
Moreover, our resources fine-tuning is also based on load tests we've developed over the years. These tests simulate user interactions through the Hopsworks API and run nightly on a newly spun-up cluster, as previously mentioned. We then use various resource suggestion mechanisms, such as KRR, to adjust the resource allocation for our services according to potential user needs.
We began the Kubernetes migration by containerizing each service. We orchestrate them using Helm, which simplified installation, updates, and deployment management. We have implemented a more precise orchestration approach which reduced deployment time from 2 hours to 30 minutes, enabling quicker testing and issue detection. Besides, cost reductions were achieved by fine-tuning resources and autoscaling based on custom Hopsworks metrics, ensuring efficient resource use and cloud compatibility.
Hopsworks' Kubernetes migration was a significant improvement in scalability, reliability, testability, and portability. For instance Kubernetes’ self-healing and automation features improved reliability by minimizing downtime. Overall, migrating to k8s streamlined operations, reduced costs, and enhanced performance for all and our customers.