Back to the Index

Training-Inference Skew

What is training-inference skew?

Model-dependent transformations are applied in both the training and inference pipelines. Training-inference skew is when there are (even slightly) different implementations of a transformation between the training and inference pipelines. Training-inference skew can silently and negatively affect model performance and is a hard bug to detect.

Why is it important to watch for training-inference skew?

Training-inference skew is a discrepancy that arises when the data preprocessing or feature transformation steps differ between the training and inference pipelines. Such inconsistencies can lead to degraded model performance and hard-to-detect issues in real-world applications. It is crucial to watch for training-inference skew for several reasons:

Model performance: Discrepancies between training and inference pipelines can result in the model performing poorly when deployed, even if it performed well during training and validation.
Debugging and troubleshooting: Training-inference skew can be challenging to identify and diagnose, as the issues often stem from subtle differences in the implementation of data preprocessing or feature transformations.
Reproducibility: Ensuring that the same data preprocessing and feature transformation steps are used in both pipelines is essential for achieving reproducible results.

Interested for more?

🤖 Register for free on Hopsworks Serverless
🐍 Learn all about the Python-Centric Feature Store
🛠️ Explore all Hopsworks Integrations
🧩 Get started with codes and examples
⚖️ Compare other Feature Stores with Hopsworks

Does this content look outdated? If you are interested in helping us maintain this, feel free to contact us.

T

Auto-regressive Models

T

Backfill features

Backfill training data

Backpressure for feature stores

Batch Inference Pipeline

T

CI/CD for MLOps

Compound AI Systems

Context Window for LLMs

T

DAG Processing Model

Data Compatibility

Data Partitioning

Data Transformation

Data Type (for features)

Data Validation (for features)

Data-Centric ML

Dimensional Modeling and Feature Stores

T

Encoding (for Features)

T

Gradient Accumulation

Grouped Query Attention

T

Hallucinations in LLMs

Hyperparameter Tuning

T

Idempotent Machine Learning Pipelines

In Context Learning (ICL)

Inference Pipeline

Instruction Datasets for Fine-Tuning LLMs

T

LLM Code Interpreter

LLM Temperature

LLMs - Large Language Models

Lagged features

T

Natural Language Processing (NLP)

T

On-Demand Features

On-Demand Transformation

Online Inference Pipeline

Online-Offline Feature Skew

Online-Offline Feature Store Consistency

T

Parameter-Efficient Fine-Tuning (PEFT) of LLMs

Point-in-Time Correct Joins

Precomputed Features

Prompt Engineering

T

RLHF - Reinforcement Learning from Human Feedback

Real-Time Machine Learning

Recommender System

Representation Learning

Retrieval Augmented Generation (RAG) for LLMs

T

SQL UDF in Python

Similarity Search

Splitting Training Data

Streaming Feature Pipeline

Streaming Inference Pipeline

T

Theory-of-Mind Tasks

Time travel (for features)

Train (Training) Set

Training Pipeline

Two-Tower Embedding Model

Types of Machine Learning

T

T

Vector Database

Versioning (of ML Artifacts)