Why did you choose to study Computer Science?
I was always interested in STEM. My dream was to study Pharmacology. However, I was more able to imagine what I would do after my studies with computer science, even if what I imagined as a student is not what I’m doing today.
The younger me was imagining a hacker girl, but I have not developed any code in the past 8 years. My focus was product management, which consists of backlog manicuring, organizational planning, and more recently is on people and strategy management.
My success, however, would not be possible without the knowledge of the underlying software development and data science concepts.
How did you then decide to become a Product Manager at Zalando and move into Product Management in Data Science?
While I was doing my PhD, I realized that in addition to my technical skills, I am also good at having a bigger vision and organizing teams, workflows, etc. Once I was finished with my PhD, product management seemed to be the right thing for me.
I turned out to be right.
What made you passionate about product management in data science?
Product management has been associated traditionally with software development, while data science mostly happens in R&D-like projects.
What aspects of software development should data science adopt?
Data science should adopt the practices of agile development, which entails releasing fast, increasing accuracy iteratively and owning the whole process end-to-end.
End-to-end means the data science product team owns the infrastructure: machine learning pipelines, quality gates, monitoring, A/B testing and dashboards.
What are some ways in which product management for data science differs from software?
The need for exploration
First of all, we need to leave some room for exploration. There is so much happening in the AI world, that we need to provide data scientists the time to experiment.
Cross-functional expertise is needed for data science products
People with different backgrounds are needed in data science. In software development there are mainly software engineers involved. In data science, you have data scientists, machine learning engineers, machine learning operations, data engineers. All these people have different backgrounds and are passionate about certain topics to different extents. While data scientists have probably studied mathematics and statistics, data engineers might have studied software development or data engineering. The data science people tend to go into depth talking about the model, while the data engineers tend to go into depth talking about the infrastructure.
A good product manager needs to bring professionals from different backgrounds together as a team.
That is a big challenge.
Collaboration with other product teams
Because data science products are normally not served as stand alone, but are supplements for another product, collaboration with other product teams is critical. You can create a great cross-sell recommendation algorithm, which at the end will be poorly integrated into the online shops. For example, not having the right pictures or showing up further down on the website than would be optimal and thus necessitating lots of scrolling down.
Even if the quality of the algorithm and the KPIs are all great, if the model is not integrated well into the end product, the poor user experience can kill all your data science efforts.
Deciding when the model quality is sufficient for release
The second difference is that there exists the challenge of deciding when the model quality is sufficient for release. In classic software development, you have a clear criteria, when the software is ready for release – when the feature is ready. In data science, it is not enough to have a model in place. You need to know when the model is good enough for the dedicated use case. The product manager or owner chooses the right quality metric and the threshold of when the quality is sufficient to release the model.
How do you define quality in data science?
Different aspects of quality in data science include training time, prediction accuracy and population coverage:
Training time is how much time you need to train the model.
Population coverage means the amount of the population (customers, products, etc.) you can actually predict by the model for.
Prediction accuracy measured by different aspects reflects how accurate the model predicts unknown values. How to measure quality depends very strongly on the use case.
How do you measure success for a data science product?
Measurement should be business-driven.
KPIs should show that we have business impact, such as increased sales, loss prevention, etc. It is wrong to only focus on the quality of the model.
We can have a great recommender system, but in the end, if people are not interested in exploring additional products and purchasing more, then the recommender does not have the desired impact.
What are a few of the key success factors that are important to keep in mind when building a data science product management organization?
Get the right engineers to data scientist ratio
People start hiring data scientists, forgetting that they need somebody to run the system. My recommendation is to hire as many machine learning engineers as data scientists. For mature projects, even more machine learning engineers are needed than data scientists.
Data is basis
Usually the data is incomplete or even-nonexistent. Even if we have great ideas and a bunch of data scientists, they won’t be able to operate without data.
Making the right central vs decentralized decision
One difficult decision companies face is whether to centralize data science products or create decentralized teams. At the beginning it might be better to go for centralized data science organizations and centralize the use case creation and production. Once a certain maturity level is achieved, specialized data scientists in dedicated products teams can be more beneficial.
What will the future data scientist roles in data science product teams look like?
The roles will become more technical. Being a great data scientist will not be enough in the future. You need to understand data engineering and machine learning pipelining as well.
Do new technologies such as no code, low code and serverless change the technical requirements?
We are using Vertex AI. You need to understand machine learning to use it, and you still need to build the infrastructure around it. Even with low- and no code, you still need to understand the use case and the associated technical requirements.
What have been a few secrets to your success?
A combination of a good technical background and soft skills, such as empathy for people. This combination gives me the possibility to communicate technical topics to non-technical people and translate business requirements into technical language.
Who is Anna Hannemann, PhD?
As a Domain Owner for Data Science at Metro.digital, Anna drives the story of data science and AI within Metro business. In her previous positions, Anna led product teams in areas of recommender systems and robotics/smart logistics. Prior to that, Anna gained several years of experience in software development followed by a PhD in Data Science. Additionally, Anna contributes proactively to a range of initiatives focused on enablement and empowerment of women in tech.