A data contract provides schema level guarantees for a feature group and includes metadata such as how/where a feature may be used, the expected freshness of the feature, when it last updated, and optionally the service level agreement (SLA) for lookups.
Data contracts are important because they ensure consistency and compatibility of feature data across the different stages of a machine learning system (feature pipelines, training pipelines, inference pipelines). By providing schema level guarantees, data contracts help to prevent issues with data quality, reliability, and versioning, ensuring that downstream systems and models can depend on the data being provided. Data contracts can also help to enforce governance and compliance requirements for feature data usage.
Create a feature group containing customer purchase data, which includes features such as product name, price, and purchase date. The feature group should also include a schema version and metadata such as the description, owner, last updated when and by whom. Use custom metadata (e.g., schematized tags) to define governance and compliance requirements for the feature group. Then, for a model staged for deployment, you can use provenance to automatically find the features used by the model, and then check the governance and compliance requirements by looking up the custom metadata for all those features.