Zurück

From Handcrafted Fixtures to AI-Generated Test Suites: Data Quality at Scale

Data quality testing has long been a painful chore: handcrafted mock data that never covers all edge cases, brittle test setups, poor ROI.

In a backbone pipeline for Otto Group one.O's complex retail media landscape, we often encountered these issues while computing behavioral attributes for millions of users. With real data ruled out by cost and compliance considerations, crafting mock data that covers the large combinatorial space took a disproportionate amount of development time. But in this new era of generative AI, new solutions emerge.

This talk explores how dbt transformed our approach to data testing - from basic schema checks to sophisticated unit tests - and how we integrated LLM-powered AI agents to automatically generate mock data and test cases for new attributes: turning a tedious manual process into a scalable, intelligent pipeline.

Real use cases, real data, real lessons learned.

Vorkenntnisse

Basic data engineering knowledge and SQL experience required.
Familiarity with data pipelines and quality challenges assumed.
dbt knowledge helpful but not required.
Interest in practical LLM applications in engineering workflows a plus.
No ML/AI background needed.

Lernziele

Attendees will

understand how to leverage the dbt data and unit tests for scalable data testing,
learn how LLMs can generate mock data and test cases reliably, and
take home a real-world impression from Otto Group one.O's full scale mock and sample data test setup.

Speaker

Felix Theodor is a Data Engineer at Otto Group data.works (now part of Otto Group one.O) since 2022. He has been working with dbt on different products and pipelines since 2023 and actively rides the wave of generative AI (for example, by not writing this text himself).
LinkedIn