Solving challenges from regulators, cybercrime and your customer base can be hard. One of the ways of responding and acting to these challenges is by using the power of data and machine learning, for example, to identify fraud, improve user engagement, and ensure responsible gambling by identifying at-risk players.
The success of Machine Learning algorithms in a broad range of areas has led to an increasing demand for ML-platform/solutions in the gaming industry. Many operators are currently focused on managing data pipelines and evaluating or developing machine learning platforms to optimize their operations and stay competitive.
ML is core to what AI-native companies like Uber, Airbnb, and Twitter do for creating new products and redefining customer experience standards. The crucial first step in their ML-process is feature engineering, and it often is the most laborious activity in the model building lifecycle. These AI-native companies almost all have built feature stores to optimize their feature engineering processes across multiple teams and models.
We helped Paddy Power mature their digital transformation by implementing the innovative and essential feature store concept, a central repository of features (input data used to train ML models) in a store that act as an enterprise-wide marketplace of features for different teams with different remits. The feature store enables the reuse of common features and uses case-specific ML-features, for predictive betting models for different sports books, anti-fraud and AML (anti-money laundering) models and player management and responsible gambling models where features are reused across different models.
The concept of a feature store was introduced by Uber in 2017 as part of its internal Michelangelo platform for ML. The feature store is a central place to store curated features within an organization. So what is a feature, exactly? A feature is a measurable property of some data-sample. It could, for example, be the number of customer transactions over a period of time (hour, day, week), the recent performance of a horse in horse-racing, or the average number of deposits and exits within the last hour. Features can be extracted directly from files and database tables or can be derived values, computed from one or more data sources.
Features are the fuel for AI systems, as we use them to train machine learning models so that you can make predictions using new feature values that your model has never seen before.
A feature store enables the reusability of features across your gaming operations, as existing features are visible to all potential users (data engineers, data scientists, machine learning engineers, business analysts, etc). Shared features can then be used to develop models for:
The feature store supports feature enrichment, discovery, ranking, lineage and lifecycle management for features.
In both training and serving models, the feature store plays a valuable role. During model training, the feature store is used to create training data in the file format of choice for the Data Scientists. There is no need to write and run new data pipelines to make feature data available in .tfrecord or .npy or .csv files. Data scientists can interactively generate train/test data in the file format of their choice on the storage platform of their choice (s3, HDFS, etc).
When models are being used, the feature store provides batch applications access to large volumes of feature data, while for online model serving, the feature store provides low latency access to feature data for online applications.
Without a feature store, organizations have ad-hoc scripts and programs for feature engineering with limited sharing of features either within teams or between teams. Features can be rewritten many times, in different ways, by different developers. Feature pipelines also need to be re-written when new training file formats appear (petastorm, for example), and enterprises have little insight into which features are being used in the organization and adding most value. Developers are also required to develop infrastructure to ensure that offline and online feature data is kept consistent, a non-trivial task.
Feature engineering pipelines are written and operated, by Data Engineers, that take data from backend systems, and transform and validate it before filling the feature store with feature data. Data scientists are now freed from heavy feature engineering and dedicate more time to developing higher quality models by selecting features and backfilling train/test datasets that they then use to train models. Data scientists are responsible for training and validating their models before they are deployed for either batch or online applications. ML engineers, who operate models in production, can also lookup feature data in real-time for applications
The Hopsworks Feature Store enables teams to work effectively together, sharing outputs and assets at all stages in machine learning (ML) pipelines. In effect, the Feature Store: