Scheduled upgrade on April 4, 08:00 UTC

Kindly note that during the maintenance window, app.hopsworks.ai will not be accessible.

April 4, 2025

App Status

Back to Blog

Hopsworks Team

Hopsworks Experts

Let's keep in touch!

Subscribe to our newsletter and receive the latest product updates, upcoming events, and industry news.

More Blogs

Hopsworks AI Lakehouse Now Supports NVIDIA NIM Microservices

How we secure your data with Hopsworks

Migrating from AWS to a European Cloud - How We Cut Costs by 62%

The 10 Fallacies of MLOps

Hopsworks AI Lakehouse: The Power of Integrated MLOps Components

Article updated on

5-minute interview Tun Shwe

Episode 20: Tun Shwe, VP of Data/DevRel - Quix

July 8, 2024

6 min

Read

Hopsworks Team

Hopsworks Experts

5-minute Interviews

TL;DR

In the early days, we were trying to create structure from data through feature engineering to handle unstructured or semi-structured data. Now, we're in this magical time with vector databases where you don't need to worry about structure and can deal with unstructured data.

We’re at the 20th episode in this series! We chat with the VP of Data/DevRel at Quix, Tun Shwe. Tun highlights the evolving challenges and innovations in data engineering and shares his journey from product engineering to data science.

Tell us a little bit about yourself.

My name is Tun Shwe. I'm the VP of Data, and I look after developer relations at Quix. Quix is a stream processing Python developer tools company. We created Quix Streams, which is an open-source, pure Python stream processing library. The remarkable thing about that is we use streaming data frames. We also manage a product called Quix Cloud, which is a serverless SaaS platform that enables you to build out your data pipelines and deploy the steps in the pipeline as Docker containers.

What is stream processing in the context of ML and AI?

Stream processing is about processing data as it is generated. Oftentimes, we work with batch data. That seems to be the majority of what people do; they allow data to be generated, collect it over time, and then, at some point in the future, schedule a job to read and process it. With stream processing, you are processing that data as it's generated. So, we're talking low latency and no delays.

How did you get into the field?

I used to run a Product Engineering studio, where we were solving what was called "big data" problems for companies. We used a tech stack with tools like Hadoop and Spark. Around that time, I found out about this fascinating creature called the Data Scientist, essentially a Statistician who could code. It was an interesting intersection of two things I liked: math and software engineering. I wanted to understand that more and maybe work with them. I ended up getting a job with a large scientific publishing company, and my first machine learning projects were around recommender systems. I spent my early years covering principles that you see again today with LLMs and techniques like RAG, such as top-k and figuring out the nearest neighbor. So, it's kind of come full circle with the power of LLMs.

Why do you think this is such an interesting field?

I quickly learned that when you start working with big data or data at high volumes, you uncover some very unique challenges, which often tease out the best parts of data engineering. All the best practices from software engineering, especially distributed programming, come into play. I was really fascinated by that niche of software engineering where we deal with really difficult problems at scale. Scaling issues were very important. If you go back nine years, it was still very early days for data science. Companies realized the power of deploying machine learning models. In the early days, you'd have them in Jupyter notebooks. Then, it became about how to go from a Jupyter notebook to something running as a service in production. It required a different toolchain and a different set of skills. The skills I had built up over the years helped me apply the right principles to put those into production. The fascinating part is it's come full circle. In the early days, we were trying to create structure from data through feature engineering to handle unstructured or semi-structured data. Now, we're in this magical time with vector databases where you don't need to worry about structure and can deal with unstructured data. It's a really interesting time where you can use almost anything as an input. That will never get boring.

Any resources you can recommend?

I guess the mission that I'm on right now is to bring stream processing and allow Python developers everywhere to access real-time data in Kafka. I recommend checking out Quix Streams on YouTube. We're focused on building things from scratch, so in a code-along style, learning concepts from scratch. As I mentioned, I ran a product engineering studio, and I think a lot of Data Scientists and Data Engineers would benefit from bringing product thinking to their work. One of the books I usually recommend is "Sprint" by a couple of authors from Google Ventures. It's about creating value in a short space of time and testing it, fitting well within the confines of scientific research and putting things into production. I always recommend learning about product thinking.

References

Interested for more?

🤖 Register for free on Hopsworks Serverless
🌐 Read about the open, disaggregated AI Lakehouse stack
📚 Get your early copy: O'Reilly's 'Building Machine Learning Systems' book
🛠️ Explore all Hopsworks Integrations
🧩 Get started with codes and examples
⚖️ Compare other Feature Stores with Hopsworks

More blogs

Beyond Self-Driving Cars

This blog introduces the feature store as a new element in automotive machine learning (ML) systems and as a new data science tool and process for building and deploying better Machine learning models

Remco Frijling

Hopsworks 3.0 - Connecting Python to the Modern Data Stack

Hopsworks 3.0 is a new release focused on best-in-class Python support, Feature Views unifying Offline and Online read APIs to the Feature Store, Great Expectations support, KServe and a Model serving

Jim Dowling

If you are employing a team of Data Scientists for Deep Learning, a cluster manager to share GPUs between your team will maximize utilization of your GPUs.

Deep Learning: Use a Cluster Manager for GPUs

If you are employing a team of Data Scientists for Deep Learning, a cluster manager to share GPUs between your team will maximize utilization of your GPUs.

Jim Dowling