From Horse Racing calculation to Anti-Cheat Lookups

We have been working with Paddy Power, an Irish gambling company, to help them calculate odds for horse racing by using two different models: one for All Weather races and one for Flat races. Paddy Power also uses Hopsworks with batch predictions as an anti-cheating system making sure that no user “knows more than the betting company”. 

The Aim: 

  • Data scientists were unable to easily discover and experiment with existing features and pipelines
  • Sharing features across models was not possible
  • Difficulties re-using features between the all weather model and the flat racing model
  • Infrastructure depending heavily on a small and dedicated team to maintain it
  • The data warehouse did not provide feature statistics or metadata, slowing down the process of feature engineering
  • Python, the preferred programming language choice of most data scientists, is not supported in Redshift
  • No centralized storage/sharing of features
  • Maintenance issues
  • Lacked the ability to collaborate

Why Hopsworks? 

  • They integrated the Hopsworks Feature Store as a repository of features ready to be used for training models with the existing AWS SageMaker architecture. 
  • Data scientists and analysts can now browse available features, inspect their metadata, investigate pre-computed statistics, and preview sample feature values. 
  • Hopsworks also allows better centralization and accessibility of data as well as collaboration between teams.

Results: 

Improved Feature Quality

Improved models that generate more revenue

Faster Feature Engineering

Access to statistics and metadata, decreasing the time to generate training datasets

Exploratory Data Analysis

Discover pre-computed features, types of those features, descriptive statistics and the distribution of feature values

Feature Reusability

Previously engineered and quality-assured features become available to be reused - ready for training

Consolidated Feature Engineering Pipelines

Feature engineering code is not duplicated in applications, instead a single pipeline computes features for serving and training

Faster Models to Production

Data scientists can concentrate on improving models, and not on complex infrastructure for ensuring training and serving pipelines are kept in sync

Website:
PaddyPower

Replacing SQL-based feature pipelines with Python, and improving speed of feature engineering: Paddy Power determines betting prices with the help of predictions generated from ML models.

Other Customers