app.hopsworks.ai is experiencing some issues - we are investigating
2
arrow back
Back to Blog
Hopsworks Team
link to linkedin
Hopsworks Experts
Article updated on

5-minute interview David Colls

Episode 22: David Colls, Founding Data Scientist - Nextdata
September 4, 2024
6 min
Read
Hopsworks Team
Hopsworks Teamlink to linkedin
Hopsworks Experts

TL;DR

“Building machine learning products is a really multidisciplinary exercise so you might come at it from different perspectives.”

Join Rik as he chats with David Colls, Founding Data Scientist at Nextdata, about Data & AI, the evolution of machine learning, and effective tech practices.

Tell us a little bit about yourself

I'm David Colls, the former director of Data and AI at Thoughtworks Australia. Prior to Thoughtworks, my career took me through data intensive initiatives in scientific and engineering computing, through building CAD software, simulation tools and so on. At Thoughtworks, I joined to do software delivery and organization transformation. Putting those two elements together brought me into the data and AI space with a focus on the technology, but also on how teams work together and how we get the most out of people and the technology that we're using to build data solutions. At Thoughtworks I also did a podcast and I recently wrote a book with some colleagues which is called “Effective Machine Learning Teams”. 

Now I've joined a company called Nextdata, which is founded by Zhamak Dehghani, who originated the idea of data mesh about five years ago and wrote a book in the meantime. She went on to found a company to progress the idea of decentralized data management as unlocking the next wave of innovation in analytics, AI and ML. It's an interesting intersection between how teams work with data and how technology can support ML and AI use cases.

In your book you talk about the correlation between building good software (engineering best practices) and creating effective AI/ML teams. Could you tell us more about that?

Building machine learning products is a really multidisciplinary exercise so you might come at it from different perspectives. For example a product perspective, where you identify where the demand is and what might work for customers, or from a data science or machine learning perspective where you're very focused on the model and getting the most performance out of it. But from a software engineering perspective, you're looking at being able to confidently make small changes and manage an operational solution. There's a lot of different perspectives that come together in building machine learning solutions. We were trying to bring those together in a way that we hadn't seen previously. Other books and material had focused on technical practices for MLOps, or focused on the data science of improving model performance, or might focus on the business applications. But we saw there was an opportunity to bring all those together.

Given that most products are built by multidisciplinary teams these days, we saw it as a resource that different perspectives could come together to understand what it takes to do as a whole team. So we've seen challenges at Thoughtworks working with numerous organizations, consulting them on strategy but also delivering solutions. We've seen challenges with centralized approaches that rely on one team to understand many different parts of the business. They can be overwhelmed by demand, but they're also not experts in the problems that they're trying to solve. They might be experts in the technology or the techniques they're using, but not the business problems they’re trying to solve. We've seen difficulty with technology that makes it hard to manage change in small batches, which is I guess the aim of a lot of software engineering workflows. So older technology can make it really hard to make progress in an agile way, to build confidence with stakeholders that you're moving in the right direction or have confidence in yourself that you can make small changes and proceed. So having seen these challenges we recognised that there were better ways to bring the learning from software engineering, working with techniques, continuous deployment and agile techniques to deliver solutions incrementally.

Do you have any interesting resources to recommend?

First up I would mention my book “Effective Machine Learning Teams”. For those who haven't heard about data mesh, I would recommend checking out Nextdata and if you are interested in evaluating it, there's an early access program.

In terms of interesting formative reading, I recommend some books that have influenced my thinking over time. First up, when it comes to designing software systems “The Design of Everyday Things” by Don Norman is a classic. I think in an age of Gen AI, when it might be difficult sometimes to infer the affordances and capabilities of Gen AI systems, I think that's a really pertinent book. When it comes to building the right thing, working with teams, and understanding other people's point of view, there's a book called “Being Wrong” by Catherine Schultz, which is a great look at what it means to make mistakes and realize you've made mistakes. And finally around getting a team that loves working and playing together “Let my people go surfing” by Yvon Chouinard.

Listen to the full episode:

References