Back to the Index

Model Bias

“Algorithms trained on biased data are biased. But learning algorithms themselves are not biased.” - Yann LeCun

What is model bias in machine learning?

Model bias refers to the presence of systematic errors in a model that can cause it to consistently make incorrect predictions. These errors can arise from many sources, including the selection of the training data, the choice of features used to build the model, or the algorithm used to train the model.

What types of model bias are there?

Common forms of model bias include selection bias, measurement bias, and algorithmic bias. Selection bias occurs when the training data is not representative of the population being modeled, leading to biased predictions. Measurement bias occurs when the measurements used to train the model are inaccurate or imprecise, leading to biased estimates. Algorithmic bias occurs when the algorithm used to train the model produces biased predictions due to inherent biases in the algorithm or the data used to train it.

How do you prevent bias in ML models?

You can prevent selection bias by ensuring your training data is representative of the different groups that your model will make predictions for. You can use evaluation sets (slices of your test set with data from groups identified of being at risk of bias) to evaluate your model performance across different groups (e.g., based on gender, ethnicity, location, etc) and identify any performance differences across those groups.

Interested for more?

🤖 Register for free on Hopsworks Serverless
🐍 Learn all about the Python-Centric Feature Store
🛠️ Explore all Hopsworks Integrations
🧩 Get started with codes and examples
⚖️ Compare other Feature Stores with Hopsworks

Does this content look outdated? If you are interested in helping us maintain this, feel free to contact us.

M

Auto-regressive Models

M

Backfill features

Backfill training data

Backpressure for feature stores

Batch Inference Pipeline

M

CI/CD for MLOps

Compound AI Systems

Context Window for LLMs

M

DAG Processing Model

Data Compatibility

Data Partitioning

Data Transformation

Data Type (for features)

Data Validation (for features)

Data-Centric ML

Dimensional Modeling and Feature Stores

M

Encoding (for Features)

M

Gradient Accumulation

Grouped Query Attention

M

Hallucinations in LLMs

Hyperparameter Tuning

M

Idempotent Machine Learning Pipelines

In Context Learning (ICL)

Inference Pipeline

Instruction Datasets for Fine-Tuning LLMs

M

LLM Code Interpreter

LLM Temperature

LLMs - Large Language Models

Lagged features

M

Natural Language Processing (NLP)

M

On-Demand Features

On-Demand Transformation

Online Inference Pipeline

Online-Offline Feature Skew

Online-Offline Feature Store Consistency

M

Parameter-Efficient Fine-Tuning (PEFT) of LLMs

Point-in-Time Correct Joins

Precomputed Features

Prompt Engineering

M

RLHF - Reinforcement Learning from Human Feedback

Real-Time Machine Learning

Recommender System

Representation Learning

Retrieval Augmented Generation (RAG) for LLMs

M

SQL UDF in Python

Similarity Search

Splitting Training Data

Streaming Feature Pipeline

Streaming Inference Pipeline

M

Theory-of-Mind Tasks

Time travel (for features)

Train (Training) Set

Training Pipeline

Training-Inference Skew

Two-Tower Embedding Model

Types of Machine Learning

M

M

Vector Database

Versioning (of ML Artifacts)