From Horse Racing calculation to Anti-Cheat Lookups

We have been working with Paddy Power, an Irish gambling company, to help them calculate odds for horse racing by using two different models: one for All Weather races and one for Flat races. Paddy Power also uses Hopsworks with batch predictions as an anti-cheating system making sure that no user “knows more than the betting company”.

The Aim:

Data scientists were unable to easily discover and experiment with existing features and pipelines
Sharing features across models was not possible
Difficulties re-using features between the all weather model and the flat racing model
Infrastructure depending heavily on a small and dedicated team to maintain it
The data warehouse did not provide feature statistics or metadata, slowing down the process of feature engineering
Python, the preferred programming language choice of most data scientists, is not supported in Redshift
No centralized storage/sharing of features
Maintenance issues
Lacked the ability to collaborate

Why Hopsworks?

They integrated the Hopsworks Feature Store as a repository of features ready to be used for training models with the existing AWS SageMaker architecture.
Data scientists and analysts can now browse available features, inspect their metadata, investigate pre-computed statistics, and preview sample feature values.
Hopsworks also allows better centralization and accessibility of data as well as collaboration between teams.

Results:

‍Improved Feature Quality

‍Improved models that generate more revenue

Faster Feature Engineering

‍Access to statistics and metadata, decreasing the time to generate training datasets

Exploratory Data Analysis

‍Discover pre-computed features, types of those features, descriptive statistics and the distribution of feature values

Feature Reusability

Previously engineered and quality-assured features become available to be reused - ready for training

Consolidated Feature Engineering Pipelines

Feature engineering code is not duplicated in applications, instead a single pipeline computes features for serving and training

Faster Models to Production

Data scientists can concentrate on improving models, and not on complex infrastructure for ensuring training and serving pipelines are kept in sync

Website:

PaddyPower

Replacing SQL-based feature pipelines with Python, and improving speed of feature engineering: Paddy Power determines betting prices with the help of predictions generated from ML models.

Other Customers

The Swedish Public Employment Service is a Swedish government agency organized under the Ministry of Employment mainly responsible for the public employment service in Sweden and the implementation of labour market policies.

HEAP provides an open access, technical research platform to assess the impact of the exposome on human health. It contains high-quality exposome data from five different cohort studies, and will be scalable to any research setting.

RISE is a state owned research institute collaborating with academia, industry and society as a central part of the Swedish innovation system.