Scheduled upgrade on April 4, 08:00 UTC

Kindly note that during the maintenance window, app.hopsworks.ai will not be accessible.

April 4, 2025

App Status

Back to Blog

Jim Dowling

CEO and Co-Founder

Let's keep in touch!

Subscribe to our newsletter and receive the latest product updates, upcoming events, and industry news.

More Blogs

How we secure your data with Hopsworks

Migrating from AWS to a European Cloud - How We Cut Costs by 62%

The 10 Fallacies of MLOps

Hopsworks AI Lakehouse: The Power of Integrated MLOps Components

Unlocking the Power of AI in Government

Article updated on

January 10, 2024

ROI of Feature Stores

March 1, 2023

7 min

Read

Jim Dowling

CEO and Co-Founder

Hopsworks

TL;DR

An analysis of the cost-benefits of Feature Stores for Machine Learning and estimates on the return on investment.

When you invest money in machine learning (ML), you typically start by investing in people. You hire data scientists, data engineers, and ML engineers to transform your data into insights that can help you both reduce costs and increase revenue. However, if you do not manage the ML assets you create (the feature engineering jobs, the feature data, the models, and the CI/CD pipelines), the cost of each ML project will be roughly constant - every new project will start over from scratch and your ML - readiness will grow slower that the leading companies in your field. That is because the leading ML companies have all invested in building a data platform for ML (aka a feature store).

In this blog, we make a cost-benefit analysis for a Feature Store for ML, identifying some of the cost reductions and productivity improvement metrics they bring:

reduced cost of ML projects through feature reuse;
reduced time-to-market for models;
reduced cost of ML operations through more efficient feature engineering;
reduced model risk by early detection of bias in data. .

Reduced cost of ML projects through feature reuse

The cost of performing the first few ML projects will not be substantially reduced if a company has a feature store. However, as more engineered features become available in the feature store, they can be reused by different teams in many different ML pipelines. As it has been estimated that 80% of the effort of ML projects is feature engineering, the reuse of features leads to substantial reductions in the cost of both developing and maintaining ML projects. With a well-populated feature store, organizations can expect to be able to productionize many more models at much reduced cost with fewer data scientists.

Twitter evaluates the success of their feature store based on how widely features are shared across teams.

Reduced time-to-market for models

As 80% of the effort of ML projects is typically feature engineering, the availability of ready-made features in the feature store enables organizations to release models in significantly less time than if no feature store is available. The feature store also reduces the time needed by eliminating the need for exploratory data analysis (EDA), as feature distributions and descriptive statistics are precomputed and available in the feature store. On top of this, there is an improved division of labor. Data engineers are more skilled at writing features pipelines for ingesting and transforming raw data from backend databases, data warehouses, and data lakes, and this increases the time available for data scientists to develop more models and better models.

Reduced cost of ML Operations through more efficient feature engineering

A feature store gives an immediate 50% reduction in the cost of maintaining feature engineering pipelines for online applications, as only one feature pipeline is needed to fill both the online and offline feature stores, not two. Without a feature store, features are computed (and often implemented) twice: once to serve features to the online application (performing model inference) and once to build train/test datasets for training models. Without a feature store, you can expect increased operational costs to ensure the consistency of both implementations of the features (serving and training). This consistency problem is technical debt that can be paid down ahead of time by having a feature store.

Reduced model risk by early detection of bias in data

“Data is biased..But learning algorithms themselves are not biased...Bias in data can be fixed.”

‍ Yann Le Cunn on how to tackle the bias problem in ML

When features have not been battle-tested and validated, there is a risk that features will either reveal sensitive information or models will introduce biased predictions (for example, predictions on slices of the data will perform differently than others). For example, models that produce different prediction results based on the race or gender of users are particularly high-risk for consumer companies.

A feature store, integrated into a ML pipeline can provide early warning for anomalies in training and serving data. One mechanism is to automate the identification and notification of feature drift - anomalous changes in the values or distribution of feature values. The feature store also enables Data Scientists to more easily build more extensive experiments analyzing models and linking the performance or bias of models to individual features from the feature store.

Hopsworks Feature Store

The Hopsworks Feature Store enables teams to work effectively together, sharing outputs and assets at all stages in ML pipelines. In effect, our Hopsworks Feature Store:

acts as an API between Data Engineering and Data Science, enabling improved collaboration between Data Engineers, who engineer the features, with Data Scientists, who use the features to train models;
enables features to be registered, discovered, validated, and used as part of ML pipelines, thus making it easier to transform and validate the training data that is fed into machine learning systems;
meets traditional Enterprise Computing requirements with support for access control, feature versioning, governance (e.g., terms of use), model interpretability, privacy, and auditing;
is horizontally scalable and highly available;
fits seamlessly into existing development environments and ML pipelines – whether you are in the cloud or on-premises, with integrations for Databricks, AWS Sagemaker, and Kubeflow.

Summary

To summarize, a way to look at the value that a feature store can bring is shown in the table below.

References

Interested for more?

🤖 Register for free on Hopsworks Serverless
🌐 Read about the open, disaggregated AI Lakehouse stack
📚 Get your early copy: O'Reilly's 'Building Machine Learning Systems' book
🛠️ Explore all Hopsworks Integrations
🧩 Get started with codes and examples
⚖️ Compare other Feature Stores with Hopsworks

More blogs

Why HopsFS is a great choice as a distributed file system (DFS) in a time when DFS is becoming indispensable as a central store for training data.

What is Distributed File Systems (DFS) and why you need it for Deep Learning

Why HopsFS is a great choice as a distributed file system (DFS) in a time when DFS is becoming increasingly indispensable as a central store for training data, logs, model serving, and checkpoints.

Jim Dowling

The Enterprise journey to creating value with AI involves decentralized experimentation. We describe the software factory approach to building AI systems.

March 21, 2024

19 min

Read

The Enterprise Journey to introducing a Software Factory for AI Systems

In this article we describe the software factory approach to building and maintaining AI systems.

Jim Dowling

Learn how to make your training distribution transparent by using Hopsworks that supports ML experiments to track and distribute ML for free.

One Function is All you Need: Machine Learning Experiments with Hopsworks

Hopsworks supports machine learning experiments to track and distribute ML for free and with a built-in TensorBoard.

Robin Andersson