Hopsworks Product Capabilities

Feature Store for AWS

Hopsworks allows you to manage all your data for machine learning on a Feature Store platform that integrates with AWS services such as AWS S3, MWAA (Airflow), Sagemaker, EMR, and Redshift. Hopsworks also integrates with almost all other data stores, including Databricks and Snowflake.

Why Hopsworks on AWS?

Open - Source
Open - Source

Hopsworks is a versatile open-source platform that offers open APIs and seamless integration with a variety of feature computation engines such as Python, Spark, Beam, Flink, and SQL. It can also work with any model training platform and model inference system.

Highest Performance
Highest Performance

Hopsworks Feature Store is built on RonDB (our cloud-native MySQL Cluster), that powers most of the world’s Network Operator Databases with 7 nines of availability. It is the feature store with the highest throughput and lowest-latency available today.

Python Native
Python Native

Support for feature engineering in Spark, Flink, and SQL, with unique Python APIs that provide high performance reading/writing of features from any Python environment (including AWS Sagemaker, KubeFlow pipelines, AWS MWAA (Airflow), Dagster, Astronomer, or Notebooks).

Integration with AWS Services (S3, Sagemaker, AWS, Redshift, EKS)

Hopsworks' stores its offline feature groups as Hudi tables in a S3 bucket. Hopsworks also supports external feature groups in its offline store with connectors for Redshift, Snowflake, and S3/Parquet. This means, you can keep your existing feature pipelines that create tables in Snowflake, Redshift, or Parquet tables on S3 and just mount them as external feature groups in Hopsworks. 

You can also write feature pipelines in Python/Spark/Flink that read from almost any data source and write to Hopsworks feature groups in Hudi. This is a lower-cost and higher-performance alternative to storage tables of features in a data warehouse. Hudi tables are read from your Blob store container via a high performance caching layer, called HopsFS, that enables NVMe read performance for your working set, instead of waiting for the slower Blob store. In Python, we provide lightning fast access via our FlyingDuck service that transfers data to Pandas clients in Arrow format (without serialization/deserialization) and server-side uses DuckDB to do push-down filtering and point-in-time correct joins across feature groups.

Hopsworks can also be connected to an EKS cluster, enabling you, from within Hopsworks, to run Python jobs, Jupyter notebooks, and model serving with KServe on the managed Kubernetes cluster.

Hopsworks.ai for managed Hopsworks Clusters on AWS

Hopsworks.ai manages your Hopsworks cluster securely inside your AWS resource group. In your AWS subscription, you need to create a tenant ID with appropriate permissions (see our docs for details), and then the Hopsworks control plane can manage your Hopsworks clusters for you, leaving you free to put your models to work, rather than maintaining your infrastructure. Hopsworks.ai control plane supports the following Cluster Commands using either the UI or a REST API:

  • cluster create/destroy (Terraform support available)
  • cluster stop/start
  • cluster backup/restore
  • cluster upgrade
  • online add/remove compute workers (optionally using spot pricing)
  • online add/remove feature store query nodes 
  • online scale up/down RonDB data nodes
Hopsworks & AWS

Hopsworks.ai manages your clusters securely inside your AWS account (subscription and resource group).

A Hopsworks cluster running in your cloud account will periodically send heartbeats (containing minimal information about cluster status) to the Hopsworks Control plane so that it knows the operational status of your cluster. For easier debugging and monitoring by Hopsworks engineers, you can also optionally select to export your monitoring metrics and logs to Hopsworks.ai.

Using your managed Hopsworks cluster on AWS

In Hopsworks.ai, you can configure your Hopsworks cluster(s) to support single sign-on (SSO), such as ActiveDirectory, LDAP, or O-Auth2. This way, members of your organization can be just given a URL to the cluster and they will authenticate via your existing SSO mechanisms. Hopsworks users (data scientists, ML/data engineers, analysts) can work directly with the Hopsworks cluster, and do not need access to the Hopsworks.ai control plane. Only administrators, responsible for managing the size and lifecycle of your cluster, require access on Hopsworks.ai.

Hopsworks & AWS

Other Hopsworks Capabilities you might find interesting