In Hopsworks, ML pipelines, and feature engineering can be fully done in python using the Hopsworks library and any framework or library of choice. From an external environment (e.g., google colab, or any other notebook environments) or simply within an Hopsworks enterprise environment.
Why use Python with Hopsworks?
Dataframes as first class-citizen
Intuitive data manipulation using dataframes, enabling easy transforms and cleaning of data for feature engineering. With a rich set of APIs and functions available in Python, dataframes provide a versatile and powerful way to explore and preprocess data.
Python native for Data Science
Python-native environment enables data scientists to leverage their Python skills to create robust operational ML pipelines and feature engineering workflows, while also allowing them to use their preferred Python frameworks and libraries.
Bring your code, frameworks and pipelines
Hopsworks' flexible platform allows users to bring their existing Python code, frameworks, and pipelines into the environment, enabling seamless integration and collaboration across teams
Think Python first
Scripts and notebooks; Hopsworks’ Python-first approach is designed to improve usability and collaboration across all data teams.
Practical and versatile; With the vast amount of libraries available, MLOps with Python is the preferred language for data science and ML engineering.
Ease of use and community; leverage the vast ecosystem and communities of developers; Learn more about community with Numfocus (link)
Feature Pipeline, Training Pipeline and Inference pipelines
The Feature, Training and Inference (FTI) pipeline pattern is a powerful concept for building scalable and maintainable ML pipelines. Hopsworks fully supports the FTI pattern, allowing users to seamlessly transition from feature engineering through training to inference. By breaking down the pipeline into three distinct stages, users can optimize each stage separately and avoid costly mistakes. Hopsworks' feature store is designed to support the FTI pattern by providing a centralized location to store and manage features, making it easy to reuse and share features across pipelines.