Hopsworks compared to

Vertex AI
Hopsworks Feature Store's capabilities and strenghts compared to
Vertex AI

Capabilities

Hopsworks logo

Hopsworks

Version:
3.1
Hopsworks logo

Vertex AI

Version:
Jan 1st 2023

What is Hopsworks?

Hopsworks is a feature store that offers a state-of-the-art solution, making it one of the most feature-rich and versatile feature stores on the market. It provides the highest level of integrability with any other ecosystem, making it easy to use with a wide range of data sources. Additionally, Hopsworks offers Python APIs that are easy to use, providing developers with great flexibility. With its multitude of sources, Hopsworks allows for a seamless feature engineering workflow, making it easy for data scientists to generate training datasets from raw data. Hopsworks is ideal for businesses that require low-latency data processing and support for multiple data sources.

What is Vertex AI?

Vertex AI is the ML platform of Google, and has a feature store that enables organizations to accelerate the development of machine learning models. It provides a fully-managed machine learning platform that simplifies the process of building, training, and deploying models. Vertex AI offers a suite of pre-built algorithms and frameworks for data scientists to be able create end-to-end machine learning pipelines. Being a GCP centric platform there are obvious vendor lock-in risks in using Vertex AI.

How to choose?

While both solutions have their strengths, Hopsworks is a better choice for businesses that require low-latency data processing and support for multiple data sources. Hopsworks allows data scientists to define how features are computed and can read data from multiple sources. Additionally, Vertex AI is less versatile than Hopsworks, making it less suitable for businesses that require low-latency data processing and support for multiple data sources. Vertex also ultimately limits users to the Google ecosystem.

Feature Store Capabilities

Hopsworks logo
Hopsworks
Hopsworks logo
Vertex AI
Engineering

Engineering

Feature Computation Engines

What frameworks/languages are supported to create features?
Any compute engine that writes tables to BigQuery

Feature pipelines computed from multiple Data Sources

Some feature stores ingest only pre-computed data, while others support defining feature pipelines.
No, features are ingested as model-ready from BigQuery or GCS

Creating Training Data and Batch Inference Data

How is feature data returned in batches for training or batch inference?
Python/Spark job that returns  Training Data or Batch Inference Data as either a DataFrame or Files (Parquet, TFRecord, CSV)
Batch Job that returns either a BigQuery Table, CSV files, or TfRecord files

On-Demand Features

Is there support for computing features on data only available from clients at request-time?
Python UDFs
N/A

Data types

What (Python) language-level data types are supported.
Most Spark and Pandas datatypes (including timestamps and arrays)
Value Types

Datatype for entity/primary keys

What (Python) language-level data types are supported by the feature store for defining primary keys for entities?
Strings

Versioning

Does the platform provide support for versioning of features or Feature Tables/Groups.
N/A - Semantic versioning using names

Data Validation

Is there support for validating data in feature pipelines before the features are written to the feature store?
N/A

Feature Testing and CI/CD

Best practices for testing and CI/CD for feature development in machine learning.
Supports industry standard DevOps processes, with Git, PyTest, and CI/CD services (Jenkins, Github Actions, etc)
N/A

Retrieving Feature Vectors from Online Store

What APIs are supported for reading a row of feature values from the online feature store?
Python or REST API
operations

Operations

Pipeline Orchestration

How are the feature/training/inference pipelines that use the feature store scheduled to run? What orchestration engines are supported?
Any Python or Spark Orchestration tool (Airflow, Dagster, AWS Lambda, etc)
Any Python Orchestration tool (Airflow, Dagster, AWS Lambda, etc)

Offline Feature Store

What data warehouse / lakehouse / object store is supported for storing offline feature data?
Hudi on HopsFS/S3 or External Tables (Snowflake, S3, GCS, JDBC, etc)
Big Query

Platform Support

What platforms is the feature store available on
AWS, Azure, GCP, On-Prem
GCP

Online Feature Store

What operational database is supported for storing online features?
RonDB
Not Public, possibly BigTable

Batch Ingestion

How are features written to the offline feature store.
REST API, Java/Python SDK to run Batch Ingestion Jobs for Materialized Features in an existing Data Source

Streaming Ingestion

Does the platform support computing features in a streaming application.
REST API, Python/Java APIs

Join Engine

A join engine can help achieve point-in-time correctness for training data.
Big Query

Reuse Features

Does the platform support feature encoding (model-dependent transformations) after the data has been stored in a Feature Table/Group?
N/A

Feature Monitoring

Is there support for identifying (and alerting) when there are anomalous changes in a feature as it is updated over time?
Feature ingestion monitoring with Great Expectations and alerting (email or slack)
Feature ingestion monitoring for feature drift and alerting

Backfill Features

Is there any additional support for specifying a job to fill up a feature table/group with feature values from data source(s) that contains historical data?
Repeated Parameterized Python or Spark Job
Batch Ingestion Job

Ranking and Retrieval Architecture Support

If you are using the feature store to build a personalized recommendation or search system, what support is there for vector DB integration?
Out-of-the-box, with OpenSearch K-NN included. External Vector Databases can be integrated.
External Vector Databases can be integrated, such Vertex Matching Engine

Model Registry & Model Serving Support

Is there support for storing the models in a registry and for running the online inference pipelines in a model serving platform?
Yes, with KServe for Model Serving
Vertex Model Registry/Serving
security and governance

Security & Governance

Access Control

What support is there in the platform for authenticating users and then definining policies.
Platform level access control and Project Membership RBAC Inside Projects
IAM Roles

Custom metadata and search

What type of tags can be created - string-based or schematized tags? How is search performed?
Names, descriptions, keywords, schematized tags - with free-text search
N/A

Provenance

What support is there for tracking the lineage of features - what raw data are they computed on, what training data or models are they used in?
N/A
If you would like a more detailed comparison and complete review of the above products feel free to contact us.