Skip to contentRU

If you have a ticket, log in to watch the video

Talk

Date: 14.09 / Start: 00:00 – Finish: 00:00

How to Process Data with Spark in the Cloud

In RussianComplexity -

Presentation pdf

How can you build a data processing pipeline using cloud services (DataProc and DataSphere), set up interaction with a Spark cluster via Jupyter laptops, and why is it convenient to do it in managed services? How can you teach the system to raise the cluster for you - exactly when you need it, and save money on it? What challenges do companies face when migrating and what solutions do they find? What are the peculiarities of cloud services? What do you need to be prepared for and what improvements might be needed?

#spark
#cloud
#real_life

Speakers

Dmitry Ribalko
Yandex Cloud
Maksim Zinal
Yandex Cloud

Other talks on «DataOps»
- Watch recording
  Hadoop in the Cloud is OK
  Mikhail Maryufich
  Odnoklassniki
  Room 1In RussianComplexity -
- Watch recording
  From Raw Clickstream to Pure Datasets, or the History of Feature Storage Development at Lamoda
  Mikhail Nesterov
  Lamoda Tech
  Dana Zlochevskaia
  Lamoda Tech
  In RussianComplexity -
- Watch recording
  How We Migrated from PostgreSQL to Data Lake at AWS
  Nick Zelenskii
  Whoosh
  Konstantin Malykhin
  Whoosh
  Pavel Sivokhin
  Whoosh
  In RussianComplexity -
- Watch recording
  Development of BI-Analytics Tool, DataOps.BI, Based on Open Source Solution Apache Superset
  Pavel Shestakov
  MTS Digital
  Room 2In RussianComplexity -
- Watch recording
  The Model Serving Journey: from Flask to Own Platform
  Alina Kocheva
  Positive Technologies
  Room 3In RussianComplexity -
Other talks on «Architecture of Data Platforms»
- Watch recording
  Examples of Real Analytical Solutions and Data Teams in Western Companies
  Dmitry Anoshin
  Surfalytics
  In RussianComplexity -
- Watch recording
  Data Depersonalization Methods
  Aleksei Danshin
  Neoflex
  Room 2In RussianComplexity -
- Watch recording
  Streaming Data Integration — ETL Tool for Creating Near Real-Time Processes
  Vasilii Melnik
  GlowByte
  Room 3In RussianComplexity -
- Watch recording
  Simulation of Event Flows in an Evolving Environment
  Nikolay Golov
  ManyChat
  In RussianComplexity -
- Watch recording
  What To Do if DWH Is Growing Too Fast
  Alexandr Filatov
  Avito
  In RussianComplexity -
- Watch recording
  Platform as a Product: Develop and Implement a Complex Technological Solution Internally
  Maxim Bartenev
  МТS Digital
  Dmitry Bodin
  МТS Digital
  Nadzhim Mokhammad
  МТС Digital
  In RussianComplexity -
- Watch recording
  How To Bring Order to the Logging of Product Events
  Alexey Balekhov
  Okko
  In RussianComplexity -
- Watch recording
  ML System Design Interview
  Pavel Filonov
  Independent consultant
  Arkadiy Vasilenko
  Odnoklassniki
  In RussianComplexity -
- Watch recording
  Creation of a Group of Services for the Analysis of Satellite Images Using ML
  Sergey Cosmos
  SR Data
  Room 1In RussianComplexity -
- Watch recording
  Data Management Platform around YTsaurus
  Vladimir Verstov
  Yandex Go
  Room 2In RussianComplexity -
- Watch recording
  I’ll Change the Way You Look at Data Storage in 30 Minutes
  Maksim Statsenko
  Yandex
  Room 1In RussianComplexity -
- Watch recording
  Building Disaster-Resistant Data Warehouses
  Aleksandr Tarasov
  Arenadata
  Room 2In Russian