Building Scalable Data Pipelines: Airflow, dbt, and SQL Optimization in Action
In today’s data-driven organizations, designing robust data pipelines is essential to ensure fast, reliable analytics at scale. This talk showcases a practical implementation of scalable data workflows using Apache Airflow, dbt, and SQL optimization — all deployed on Google Cloud Platform (GCP).
We’ll walk through the architecture and key design choices made during a migration to GCP, with a focus on orchestrating data ingestion (batch and streaming), transforming data with dbt, and tuning SQL for performance. You’ll see how DevOps-inspired practices, such as modularity, CI/CD, and environment management, help ensure automation, reliability, and long-term maintainability.
The talk will combine a slide-based walkthrough with pre-recorded video demonstrations of real pipelines, DAGs, dbt projects, and optimization patterns. These examples will help attendees bridge theory and practice — from ingest to transformation to delivery
Vorkenntnisse
This talk is ideal for Beginner Data Engineers, BI professionals, and Developers looking to optimize data workflows in a cloud-native environment using modern ELT techniques.
Lernziele
Attendees will:
- Understand how to build and orchestrate batch & streaming pipelines using Airflow on GCP.
- Learn how to structure dbt projects for modular, testable, and version-controlled transformations.
- Apply SQL tuning techniques to improve data warehouse performance.
- Implement data quality checks and governance patterns with dbt and Airflow sensors.
- See how CI/CD and DevOps strategies enhance the stability of ELT workflows.