Back to the Index

Schema

What are schemas in feature stores?

A schema defines the shape, order, and type of data stored in ML artifacts, including:  feature groups, feature views, training datasets, and models. The schema enables the validation of the shape, order, and type of data that is either read from or input to a ML artifact. Some ML artifacts, such as feature groups, feature views, and models can support schema versioning. A schema version change indicates that the new version has a breaking schema change compared to the previous version.

Why is a schema important for ML artifacts?

  1. Enforcing data contracts: A schema defines the structure and data types of features within a feature group, ensuring that the data conforms to a specific format. This enforces a data contract between producers and consumers of the features, which is crucial for maintaining consistency and reliability in the machine learning pipeline.
  2. Promoting best practices: By defining a schema, data engineers and data scientists are encouraged to follow best practices in data modeling and management. 
  3. Versioning: Schemas enable versioning of feature groups. If the structure or data types of a feature group change, a new schema version can be created to accommodate the changes without disrupting existing pipelines or models.
  4. Error detection: With a schema in place, errors in data shape, order, or type can be detected early in the pipeline, making it easier to identify and fix issues before they propagate downstream.
Does this content look outdated? If you are interested in helping us maintain this, feel free to contact us.