Talk type: Talk

100 billion messages in Kafka: load and forget

  • Talk in Russian
Presentation pdf

Apache Kafka is a great tool for reliably passing messages between services, but offloading its content for offline analytics has proven to be no easy task. Especially when we're talking about hundreds of billions of messages a day, every day. Apache Spark comes to the rescue, but unfortunately, its capabilities aren't enough to work reliably and fully automated on really big data volumes. The speaker will talk about how to offload from Apache Kafka to HDFS 100 billion messages a day and stop thinking about it.

The talk will be of interest to developers in Big Data who use Kafka to transfer large amounts of data to Hadoop.

  • #kafka
  • #spark
  • #streaming

Speakers

Invited experts

Schedule