Skip to contentRU

Log in to watch the video

Talk

Data Processing

Date: 29.10 / Start: 00:00 – Finish: 00:00

Automated Spark application tuning

In RussianComplexity -

Presentation pdf

Valeria will talk about the Hadoop cluster, where hundreds of daily and thousands of hourly Spark calculations run. All the calculations are very different and have their own SLA. In this situation, it's unrealistic to tune in-house with engineers. That's why they built and implemented a fully automatic tuning system based on the logs Spark writes itself. Valeria will show you how to easily extract a lot of information from these logs in the offline mode and what to look for when automatically tuning spark.executor.memory. She will also explain in detail how their tuning system is set up and what allows them to constantly adjust for changes. The talk will be of interest to those who already deal with Spark and have an idea of its structure.

#apache spark
#automation
#resource utilization

Speakers

Valeriia Dymbitskaia
oneFactor

Invited experts

Evgeny Nenakhov
МТS Digital

Other talks on «Data Processing»
- Watch recording
  NiFi scripts as an element of Less Code ETL
  Bronislav Zhitnikov
  Tinkoff
  In RussianComplexity -
- Watch recording
  How we let users build their ETL
  Alexey Polyansky
  Tinkoff
  In RussianComplexity -
- Watch recording
  Love and hate Prefect 2.0 after Apache Airflow
  Iuliia Volkova
  Independent consultant
  In RussianComplexity -
- Watch recording
  The many faces of pandas
  Pavel Filonov
  Independent consultant
  In RussianComplexity -
- Watch recording
  Using Pentaho DI
  Konstantin Sergeev
  SMP Bank
  In RussianComplexity -
- Watch recording
  Ingest layer of the data platform: mix but do not shake
  Oleg Kochergin
  SberHealth
  In RussianComplexity -
- Watch recording
  100 billion messages in Kafka: load and forget
  Denis Efarov
  Odnoklassniki
  In RussianComplexity -