How to build reliable data pipelines re-using the software development’s best practices.

You have been solicited to implement a new data pipeline architecture. The aim: turn wealth of raw data into friendly off-the-shelf data inputs for the data science, analytics and reporting teams.

Following the software development’s best practices, you will learn how to enforce a successful workflow, distinguish between bad and good software implementations, escape pitfalls and enable data to be served across the company.

  • Overview on the Software Development Best Practices (Don’t Repeat Yourself, Decouple, Design by Contract, Crash Early)
  • Hands-on Example: A Python Data Pipeline (Standardisation: black, mypy and pylint formatting; Testing & Documentation: Unittests; Automation: CI/CD Integration)

Vorkenntnisse

The target group are entry-level data engineers/scientists/analysts looking for conceptual ideas and best practices on how to implement data pipelines, re-using the best practices of software development.

Lernziele

  • Understand the benefits of adopting the best principles of software development in your systems;
  • Learn to visualize, through a concrete example, what these best principles could be, and the various trade-offs to be taken into account;
  • Learn how to implement these principles in your production environment.

Speaker

 

Olivier Bénard
Olivier Bénard is a Data Engineer in the retail industry. He takes care of Google Cloud data platform, builds ETL/ELT data pipelines and the cloud infrastructure for them to shape the scalable future of data-driven work in this industry. His background lies in Software Engineering.

data2day-Newsletter

Ihr möchtet über die data2day
auf dem Laufenden gehalten werden?

 

Anmelden