Scio — data processing at Spotify

EN / День 3 / 19:00 / Зал 3

Scio is an open source Scala API for Apache Beam and Google Cloud Dataflow.

It's created by Spotify to process petabytes of data in both batch and streaming mode and is adopted by dozens of other companies as well.

We'll talk about the evolution of big data at Spotify, from Python, Hadoop, Hive, Storm, Scalding to today's world of cloud, and serverless computing. We'll look at some classic use cases behind the scene, e.g. Discover Weekly, Wrapped, and the challenges the company faced.

We'll also talk about some features that make it stand out from other Scala big data frameworks, including Spotify's uses of Algebird, macros, shapeless, magnolia, etc. to make large scale data processing easier, safer, and faster.


Neville Li
Spotify

Neville is a software engineer at Spotify who works mainly on data infrastructure for machine learning and advanced analytics. In the past few years he has been driving the adoption of Scala and new data tools for music recommendation, including Scalding, Spark, Storm and Parquet. Before that he worked on search quality at Yahoo! and old school distributed systems like MPI.

Место проведения