No items found.

Fabio Buso

VP Engineering

Let's keep in touch!

Subscribe to our newsletter and receive the latest product updates, upcoming events, and industry news.

More Blogs

5-minute interview Nelleke Groen

Building a Cheque Fraud Detection and Explanation AI System using a fine-tuned LLM

5-minute Interview Camilo Rodriguez

Build your Value Case for Feature Store Implementations

Job Scheduling & Orchestration using Hopsworks and Airflow

Article updated on

Feature Pipelines in Production with Hopsworks

Code, Deployment & Monitoring

April 30, 2024

3 min

Read

Fabio Buso

VP Engineering

Hopsworks

TL;DR

Introduction

In this post, we will look at how to put feature pipelines into production using Hopsworks. Feature pipelines are code segments responsible for creating features and registering them with feature stores. Feature stores make these features available for data scientists to train models or for production models to make predictions.

Productionizing these pipelines regularly refreshes features so new models can retrain and production models make sharper predictions. As we've seen in previous posts, Hopsworks supports various frameworks like Pandas, Spark, and Flink for building pipelines. It also allows us to create external pipelines with Snowflake. Now let's focus on the productionization sequence - deployment and monitoring.

Managing Codebases

Code for generating features typically resides in repositories like GitHub, GitLab, or BitBucket. Hopsworks integrates with these tools to automatically pull our repositories into its environment for execution.

For example, once GitHub credentials are configured, a cloned repository containing feature pipelines is available in Hopsworks. We can directly run jobs from this codebase with a single click.

Executing Pipelines

Hopsworks offers flexibility in where pipelines execute. It provides native compute for Spark, Flink, or Python pipelines. Alternatively, we can use existing infrastructure like Databricks or custom Python environments.

creating new job in Hopsworks

We will focus on executing pipelines natively within Hopsworks by queuing jobs and monitoring them. We can create jobs via UI or APIs:

Job.schedule(cron_expression, start_time=None, end_time=None)

# Schedule the job
job.schedule(
    cron_expression="0 */5 * ? * * *",
    start_time=datetime.datetime.now(tz=timezone.utc)
)

# Retrieve the next execution time
print(job.job_schedule.next_execution_date_time)

The scheduler supports cron expressions for advanced scheduling. An interesting capability is time travel - we can set start times in the past to backfill historical data. Hopsworks will execute jobs serially as if they ran on schedule earlier. This helps us create training data or warm up production models with past behaviors.

Monitoring Failures

It's critical to monitor if production pipelines fail unexpectedly. Hopsworks has alert integrations with email, Slack, and Kafka for this purpose.

Monitoring in Hopsworks

For example, with Slack alerts:

Hopsworks admin configures webhook
We define "receivers" per project - who gets notified for which events
On pipeline failure, a critical Slack alert is sent with metadata

More advanced data and quality monitoring is also available in Hopsworks but not covered here.

Summary

In summary, we looked at end-to-end productionization of feature pipelines with Hopsworks - managing code, deployment, scheduling and monitoring. This keeps features fresh for improving models daily through a reliable, observable pipeline.

‍

Watch the full video on how to productionize feature pipelines with Hopsworks:

References

Interested for more?

🤖 Register for free on Hopsworks Serverless
📚 Get your early copy: O'Reilly's 'Building Machine Learning Systems' book
🐍 Learn all about the Python-Centric Feature Store
🛠️ Explore all Hopsworks Integrations
🧩 Get started with codes and examples
⚖️ Compare other Feature Stores with Hopsworks

More blogs

At the latest LLM Makerspace looked at how to build a check fraud detection system using LLMs by breaking down the process into three main pipelines.

Data Engineering

Building a Cheque Fraud Detection and Explanation AI System using a fine-tuned LLM

The third edition of the LLM Makerspace dived into an example of an LLM system for detecting check fraud.

Jim Dowling

We’re back with another episode of Hopsworks 5-minute interviews! Today we meet Camilo Rodriguez, founder of the AI consultancy firm MLab.ai.

5-minute Interviews

5-minute Interview Camilo Rodriguez

We’re back with another episode! Today we meet Camilo Rodriguez, founder of the AI consultancy firm MLab.ai.

Hopsworks Team

We look at the end-to-end productionization of feature pipelines with Hopsworks, from managing code to deployment, scheduling and monitoring.

Feature Pipelines in Production with Hopsworks

In this post, we will look at how to put feature pipelines into production using Hopsworks.

Fabio Buso

PRODUCT

RESOURCES

COMPANY

JOIN OUR MAILING LIST

Subscribe to our newsletter and receive the latest product updates, upcoming events, and industry news.

© Hopsworks 2024. All rights reserved. Various trademarks held by their respective owners.

Terms and Conditions