Zurück

How to build reliable data pipelines re-using the software development’s best practices.

You have been solicited to implement a new data pipeline architecture. The aim: turn wealth of raw data into friendly off-the-shelf data inputs for the data science, analytics and reporting teams.

Following the software development’s best practices, you will learn how to enforce a successful workflow, distinguish between bad and good software implementations, escape pitfalls and enable data to be served across the company.

Overview on the Software Development Best Practices (Don’t Repeat Yourself, Decouple, Design by Contract, Crash Early)
Hands-on Example: A Python Data Pipeline (Standardisation: black, mypy and pylint formatting; Testing & Documentation: Unittests; Automation: CI/CD Integration)

Vorkenntnisse

The target group are entry-level data engineers/scientists/analysts looking for conceptual ideas and best practices on how to implement data pipelines, re-using the best practices of software development.

Lernziele

Understand the benefits of adopting the best principles of software development in your systems;
Learn to visualize, through a concrete example, what these best principles could be, and the various trade-offs to be taken into account;
Learn how to implement these principles in your production environment.

Speaker

Olivier Bénard is a Data Engineer in the retail industry. He takes care of Google Cloud data platform, builds ETL/ELT data pipelines and the cloud infrastructure for them to shape the scalable future of data-driven work in this industry. His background lies in Software Engineering.