Lambda Architecture: Implementing the Speed Layer with Storm and Spark Streaming
The Lambda Architecture (LA) enables developers to build large-scale, distributed data processing systems in a human fault tolerant way. In the LA we deal with three layers: 1. the batch layer, managing the master dataset and pre-computing batch views, 2. the serving layer, indexing batch views so that they can be queried in a low-latency manner, and 3. the speed layer, dealing with recent data only, processing the incoming data online.
In this talk we focus on how to implement the speed layer using two prominent Apache project: Storm and Spark. We will discuss pros and cons, discuss use cases and demonstrate concrete examples in this domain.
is MapR's Chief Data Engineer, where he helps people tap the potential of Big Data by bridging the technical (reliability, scalability, etc.) and the business side (RoI, TCO, etc.). His background is in large-scale data integration, the Internet of Things, and Web applications and he's experienced in advocacy and standardisation (World Wide Web Consortium). Michael is sharing his experience with the Lambda Architecture and distributed systems through blog posts and public speaking engagements and is also a contributor to Apache Drill. Prior to MapR, Michael was a Research Fellow at National University of Ireland, Galway.
// Fabian Wilckens
ist Senior Solutions Architect DACH bei MapR Technologies. Bereits seit vielen Jahren beschäftigt sich Fabian mit verschiedenen Themen der IT-Architektur, hat für namhafte Unternehmen gearbeitet und weltweit Unternehmenslösungen geplant und implementiert.