The monitoring of machine learning models in production has garnered a lot of attention in recent years. However, the problem of monitoring features regardless of model training has received comparatively less attention. In fact, the monitoring of models could be considered a subproblem of the more general problem of monitoring data in a feature store, i.e., feature monitoring.
Features can be monitored under different criteria, independently of whether they are used in production models or not, providing early warning of potential issues with your data. To monitor a machine learning feature over time we need to compute statistics on the feature values. Hopsworks makes it easy to automate the computation of statistics on a schedule, or at data ingestion time. The subset of feature values to compute statistics on will depend on the use-case, and the supervision of these statistics can be done manually - using visualisations - or automatically by setting thresholds and reference values.
Effective techniques to monitor features include:
- Leveraging human intuition by observing the evolution of feature statistics over time through comparative visualisations.
- Comparing feature statistics against Training Dataset statistics where the feature is involved.
- Comparing feature statistics with previously computed feature statistics by defining detection and reference windows to select the subset of feature values to focus on.
Join us as we delve into the feature monitoring problem, and build a holistic view placing model monitoring in the picture. Regardless of whether models are deployed into production, Hopsworks provides different approaches for monitoring machine learning features over time. From the continuous observation of a feature statistic computed on a schedule, to the monitoring of feature statistics using detection and reference windows. Moreover, Hopsworks provides a flexible UI that enables powerful visualisations of your feature statistics to spot issues in the blink of an eye.