Nearly a year ago, the Hopsworks team embarked on a journey to migrate its infrastructure to Kubernetes. In this article we describe three main pillars of our Kubernetes migration.
We describe the capabilities that need to be added to Lakehouse to make it an AI Lakehouse that can support building and operating AI-enabled batch and real-time applications as well LLM applications.
We present how Hopsworks leverages its time-travel capabilities for feature groups to support reproducible creation of training datasets using metadata.
Learn more about how Hopsworks (RonDB) outperforms AWS Sagemaker and GCP Vertex in latency for real-time AI databases, based on a peer-reviewed SIGMOD 2024 benchmark.
Read how Hopsworks generates temporal queries from Python code, and how a native query engine built on Arrow can massively outperform JDBC/ODBC APIs.
This article introduces a taxonomy for data transformations in AI applications that is fundamental for any AI system that wants to reuse feature data in more than one model.
We present a unified software architecture for batch, real-time, and LLM AI systems that is based on a shared storage layer and a decomposition of machine learning pipelines.
The third edition of the LLM Makerspace dived into an example of an LLM system for detecting check fraud.
This article covers the different aspects of Job Scheduling in Hopsworks including how simple jobs can be scheduled through the Hopsworks UI by non-technical users
A summary from our LLM Makerspace event where we built our own PDF Search Tool using RAG and fine-tuning in one platform. Follow along the journey to build a LLM application from scratch.
On the decision of building versus buying a feature store there are strategic and technical components to consider as it impacts both cost and technological debt.
This is a summary of our latest LLM Makerspace event where we pulled back the curtain on a exciting paradigm in AI – function calling with LLMs.
We go through the most common errors messages in Pandas and offer solutions to these errors as well as provide efficiency tips for Pandas code.
Read about the advantages of using DBT for data warehouses and how it's positioned as a preferred solution for many data analytics and engineering teams.
We review Python libraries, such as Pandas, Pandas2 and Polars, for Feature Engineering, evaluate their performance and explore how they power machine learning use cases.
Delve into the profound implications of machine learning embeddings, their diverse applications, and their crucial role in reshaping the way we interact with data.
We explain a new framework for ML systems as three independent ML pipelines: feature pipelines, training pipelines, and inference pipelines, creating a unified MLOps architecture.
Unlock the power of Apache Airflow in the context of feature engineering. We will delve into building a feature pipeline using Airflow, focusing on two tasks: feature binning and aggregations.
An ML model’s ability to learn and read data patterns largely depend on feature quality. With frameworks such as FeatureTools ML practitioners can automate the feature engineering process.
In this article, we outline how we leveraged ArrowFlight with DuckDB to build a new service that massively improves the performance of Python clients reading from lakehouse data in the Feature Store
Find out how to use Flink to compute real-time features and make them available to online models within seconds using Hopsworks.
Explore the power of feature engineering for categorical features using Pandas. Learn essential techniques for handling categorical variables, and creating new features.
Learn more about how Hopsworks stores both data and validation artifacts, enabling easy monitoring on the Feature Group UI page.
In this blog, we introduce Hopsworks Connector API that is used to mount a table in an external data source as an external feature group in Hopsworks.
Learn how the Hopsworks feature store APIs work and what it takes to go from a Pandas DataFrame to features used by models for both training and inference.
In this blog post we showcase the results of a study that examined point-in-time join optimization using Apache Spark in Hopsworks.
Programmers know data types, but what is a feature type to a programmer new to machine learning, given no mainstream programming language has native support for them?
Operational machine learning requires the offline and online testing of both features and models. In this article, we show you how to design, build, and run test for features.
We are introducing a new feature in Hopsworks UI - feature code preview - ability to view the notebook used to create a Feature Group or Training Dataset.
In this blog post we demonstrate how to build such a pipeline with real-world data in order to develop an iceberg classification model.
Hopsworks brings support for scale-out AI with the ExtremeEarth project which focuses on the most concerning issues of food security and sea mapping.
This tutorial gives an overview of how to work with Jupyter on the platform and train a state-of-the-art ML model using the fastai python library.
Many developers believe S3 is the "end of file system history". It is impossible to build a file/object storage system on AWS that can compete with S3 on cost. But what if you could build on top of S3
Read how Hopsworks supports easy hyperparameter optimization (both synchronous and asynchronous search), distributed training using PySpark.
Hopsworks is replacing Horovod with Keras/TensorFlow’s new CollectiveAllReduceStrategy, a part of Keras/TensorFlow Estimator framework.