Talk type: Talk

How to Process Data with Spark in the Cloud

  • Talk in Russian
Presentation pdf

How can you build a data processing pipeline using cloud services (DataProc and DataSphere), set up interaction with a Spark cluster via Jupyter laptops, and why is it convenient to do it in managed services? How can you teach the system to raise the cluster for you - exactly when you need it, and save money on it? What challenges do companies face when migrating and what solutions do they find? What are the peculiarities of cloud services? What do you need to be prepared for and what improvements might be needed?

  • #spark
  • #cloud
  • #real_life