An important challenge in building ML systems is deciding how to structure you ML system - whether it is a batch ML system or a real-time ML system or whether you are fine-tuning a LLM, and then using RAG to connect it to external data.
In this talk, we will present a unified architecture for these ML systems built around the idea of feature, training, and inference pipelines connected by a feature store. This architecture presents a clear set of abstractions with which to build different classes of ML systems - from batch to real-time to LLM systems. It is also lifts discussion of MLOps from technologies to principles, such as versioning, testing, and monitoring of ML systems.
---