“In MLOps and machine learning, best practices don't really exist yet as things are developing right now. So basically everyone can participate in the development and provide a great input into it”
Time for episode seven of 5-minute interviews where we have a chat with Fedor Bystrov, Software Engineer at Skyscanner. Fedor guides us through the architecture and development aspects of Kaleidoscope, his self developed feature store, and gives us his two cents on why it’s so exciting to work in MLOps right now.
Fedor: My name is Fedor and I work in the MLOps team at Skyscanner as a software engineer. I started out by studying physics and I’ve worked with software engineering in billing, monetization and payments at Revolut. Now I'm at Skyscanner doing MLOps, which is a field I got into by chance. I never planned to do it, but I really enjoy it. I think it's a great field of study and work and more should get into it. For example the models that Open AI develops are mind blowing, it motivates me and other people in the field to put in more work and effort. It’s definitely an amazing time to jump into the field.
Fedor: So at Skyscanner my team manages the feature store. We wrote the feature store ourselves, so we have a homegrown feature store and machine learning platform that we developed internally from the ground up. Basically we manage all of the steps of the MLOps lifecycle such as model deployment and model monitoring. Model monitoring is also a very interesting topic for us right now. So our MLOps team at Skyscanner all work with functionalities of how to know when your model underperforms, how to deploy a model and how to deliver it to market faster.
Fedor: We are using AWS and have a pretty standard feature store architecture, like offline store and online store. The offline store is just a S3 bucket and some SDKs on top of it, and the online store is DynamoDB and microservice on top of it. We have ETL jobs and pipelines that move data from offline to online and from online to offline. So architecturally it’s pretty standard stuff.
Fedor: If you take a look at recent developments into the market, for example Sagemaker, the way they deploy models and integrate models with feature stores are a bit different from us. So what we do is that each model is deployed into its own EC2 instance in Kubernetes. Whereas AWS Sagemaker deploys models in Lambda functions, and they deploy it serverless. If you want to integrate with a feature store, you also deploy it in a serverless kind of way. You basically write a script and deploy it. What we do is a bit different. We are modeling EC2 instances in Kubernetes and we have to deploy and manage all EC2 instances ourselves.
We are using Databricks that provide similar kinds of stuff to what Sagemaker provides. It's always interesting to explore other ways of doing things. Maybe it will be easier for us, for example, if we will use serverless Lambda functions to deploy models it means we don't need to manage EC2 instances with kubernetes and everything. So It simplifies our infrastructure a lot and we can put more effort into other areas, for example, alerting and monitoring the models. It's always important to research the field around you so you're efficient.
Fedor: What I find interesting is how rapidly the field develops. You can work in software engineering, which has existed for like 40 years, and do things and there will be best practices and industry standards (for example kubernetes). In MLOps and machine learning these things don't really exist yet as things are developing right now. So basically everyone can participate in the development and provide a great input into it. For me, it's a way of doing something meaningful for others. I want to be part of this community and contribute to it. It’s a field of endless opportunities!
Fedor: I think a great thing you can do is to join local MLOps meetups, they are organized locally all over Europe. If you are participating, I also think it’s good to not only listen, but to share your experience (if you already work in the field). For me, I also think it’s interesting to meet and talk to other engineers who are working on similar problems but to get a different perspective.
In terms of reading I can recommend a book. It’s not really about MLOps but close enough, it’s called “The DevOps Handbook”. It’s pretty high level but it lays a baseline for MLOps, because a lot of good practices that are in DevOps (which is a field that has existed for longer) can also be applied in MLOps.
Listen to this and other episodes on Spoitfy: