In this talk, we will cover the following topics:
• Introduction to Machine Learning Feature Stores (5 min): Understanding the role of feature stores in the MLOps stack and their significance in managing machine learning features within organizations.
• Data management architecture behind Feature Stores (2-3 min): Exploring the underlying mechanisms and data management components employed in feature stores.
• Introduction to DuckDB and Arrow Flight (5 min): Highlighting the integration of DuckDB and Arrow Flight into the PyData ecosystem, leveraging the capabilities of Arrow.
• The journey of integrating DuckDB and Arrow Flight into our Feature Store platform (12 min): Sharing our experiences and insights on integrating DuckDB and Arrow Flight into the Hudi-based Lakehouse platform that powers our (offline) feature store, discussing challenges and successes encountered along the way.
• Benchmarks (5 min): Presenting a benchmark comparing the performance of DuckDB/Arrow Flight vs Spark/HiveServer2, in particular for small to medium sized data.