Next-Level Reliability: Multi-Region High Availability comes to Feature Stores

January 26, 2024

High availability in machine learning is essential for maintaining operational continuity, scalability, and resilience in the face of various challenges, ultimately contributing to the reliability and operability of ML Systems. Feature stores are an integral part of mission critical ML Systems (such as applications for live credit scoring and real-time fraud detection) as data engineers need to frequently curate new features to train more efficient models. Hence, engineers need to be able to rely on the feature store be operational in case of unexpected downtime.

Read our 2-part article by Antonios Kouzoupis, Software Engineer at Hopsworks, explaining the resilient architecture of Hopsworks feature store. Dive into the articles to learn more about how the architecture is built to ensure operational continuity, global accessibility, fault tolerance and a seamless user experience.

Blog: Single Region Highly Available Hopsworks

Single Region Highly Available Hopsworks

Explore the components of Hopsworks Feature Store and the technologies that provide high availability and fault tolerance to the system.

Read part 1

Blog: Multi-Region Architecture for Demanding Applications

Multi-Region Architecture for Demanding Applications

We expand on the architecture to fit a Tier 1 classification  where all components of Hopsworks are replicated in a different geographical region.

Read part 2

No items found.