Open Table Formats in the Wild™ – Reloaded: Vortexing Ducks over Floating Icebergs
Apache Iceberg and Parquet are foundational to the modern data stack, promising engine interoperability, ACID guarantees, and freedom from vendor lock-in. But can they meet modern demands like CDC, low-latency streaming, and AI point lookups?
This intermediate-level talk explores how Iceberg actually performs in the real world for organizations that aren't tech giants like Netflix. We discuss why incremental processing isn't native to Iceberg, how its metadata limits streaming, and why Parquet’s layout bottlenecks AI access. Finally, we introduce DuckLake and Vortex as emerging alternatives for next-generation file and table formats.
Vorkenntnisse
Attendees should bring intermediate domain expertise in data engineering. You should already understand the fundamentals of open table formats.
Lernziele
- Why incremental processing is not native to Apache Iceberg.
- How Iceberg's metadata creates hard limits for low-latency streaming.
- Why Parquet's physical layout bottlenecks point lookups and modern AI access patterns.
- About DuckLake and Vortex, getting an early look at these emerging alternatives for table and file formats.