The secrets of a modern data science team
Diversity: the driver of our team
With a cross-functional team spread out across Germany, we power all product recommendations within the METRO world. With more than 17 million customers in over 20 countries, it is quite a challenge to serve personalized recommendations in real-time across multiple touchpoints and use cases. There are many factors we have to consider doing this successfully. For example, practical concerns on how to ensure our algorithms run smoothly in all countries and higher-level questions about how to integrate the data coming from all these countries to properly fine-tune our models' performance. The secret to our success?
Diversity. Our team doesn't only consist of data scientists; we also have software and data engineers, a data analyst, an agile coach, and a product manager. That might sound like a lot, but every role brings an essential skillset and perspective to the table. It is the combination of these different areas of expertise that makes us better at navigating the complex and uncertain world of large-scale machine learning solutions.
Connecting our products with millions of customers
Currently, we provide 3 different types of product recommendations to make our customers' lives easier:
- 1. Inspirational recommendations for cross-selling products,
- 2. Alternatives recommender for conveniently finding product substitutes if something is out of stock.
- 3. And promotional products so that customers can easily find those products that are currently on promotion and most relevant to them.
In short, we strive to identify the relationships between products and customers, as well as between the products themselves, so that we can anticipate our customers' needs and make their shopping experience smoother and more efficient. With more than 80,000 products in our German assortment alone, we can make our customers' lives a lot easier by providing the right recommendations in the right place.
While this certainly is a challenge, we have a clear advantage at METRO: we can attribute every single purchase to a specific customer. With more than 150 million purchases in 2021 alone, we have a great data foundation for building data science solutions. In addition, we have a variety of customer touchpoints like our online shop, our mobile app, or direct marketing communication. By collecting and evaluating user interaction data, we can find a lot of information about which products (or other content) are interesting for specific customers. In the end, it's all about delivering smart solutions to make each day a success for our customers – our mission at METRO.digital.
Working smarter
However, this is a huge amount of data and dealing with it can be difficult and extremely time-consuming. We have to process it, cross-reference different information sources, and make sure everything is clean and can be used to answer our questions before we can integrate it into our model training datasets and analyze it. Therefore, we recently decided to put a lot more emphasis on making this process less labor-intensive. For example, by switching to a data transformation and integration platform called dbt, we can now create a single view of the data we need, together with quality tests, documentation, and a clearly understandable data lineage. It saves our data scientists and analysts a lot of time and lowers the barrier to creating more powerful models with more data. In other words, we aim to work smarter, not harder! This is only possible thanks to the high focus on data in the rest of the company. Data is standardized to a large degree across our 20 METRO countries, making it easier to combine information from all these countries, and our central data infrastructure team makes sure that it is easy to find in our data lake on BigQuery.
A great tech stack for a smarter workflow
METRO.digital's close collaboration with Google is another key factor in our quest to deliver great recommendations while working smarter. We have been early adopters of Google's Vertex AI platform, which we use to run all of our machine learning workflows. As a result, many of our data-related needs can be served by a single platform, the Google Cloud Platform. We can access our data, filter it for analysis, and build training pipelines and experiment with our models without any need for additional engineering labor. This makes our workflows more streamlined than they used to be.
Once a model is ready to be delivered to our customer touchpoints, we use state-of-the-art technologies to ensure that the recommendations it generates are shown to customers as fast as possible. Every time a customer clicks on a product in the online shop or wants to see which products are on promotion in their local store, we receive an automated request for a personalized list of products through our API (our gateway to the METRO world) and must send a response in real-time. To make sure this happens quickly and to catch any errors, we write our API in Go and use Kubernetes and Seldon Core for scaling and monitoring. This tech stack also makes A/B-testing, an essential step for improving model performance, much more convenient for us.
A/B testing: our North Star
That said, A/B testing is still a challenge to get right. We want to test ideas as fast as possible to iterate over our solutions and increase our understanding of what products are most relevant to our customers. It is very difficult to assess the quality of machine learning models without actually trying them out, because they adapt to customer behavior in real-time. We can evaluate them on historical datasets, but in the end, we can only have a rough estimate of what a great recommendation is until we observe the model in action. This is especially challenging for inspiring recommendations since we might not know from previous transactions how individual customers react to some types of products, similar to how Netflix cannot quite know how you might like a movie genre you have never seen before. We are constantly working to improve our A/B-testing setup to make it simpler for data scientists to try out any model they produce. Ideally, data scientists can go from idea to experimentation in only a few hours, without depending on large engineering efforts. We currently serve different models in every country, so we are running up to 20 A/B tests at the same time. This makes it even more important to reduce the manual effort this involves and optimize our workflows for starting, running, and evaluating new tests. With an incredibly diverse customer base at METRO across countries and sectors, we try to find flexible enough solutions that still give each of our customers a great personalized shopping experience.
Passion for more
All in all, even though we have a very good setup for allowing continuous improvements for our machine learning models, there is still a lot of potential to strive for more. We want to make data science workflows even easier and allow progressive A/B-testing rollouts. At the same time, we also want to cover more recommendation use cases within different customer touchpoints, with more and more knowledge about what makes a great recommendation in the first place.
If these challenges excite you and you want to work in a diverse team with a very open mindset, just reach out to us! We are always looking for people who have passion for food and hunger for tech to make our customers' lives easier with the help of data. We have roles for working students, internships, and full-time positions at METRO.digital. Check our open positions.