Start of main content

How to design a high-performance distributed SQL engine

Day 2


Distributed SQL engines must process data across multiple servers. In this talk, Vladimir will tell, using Apache Flink and Presto as an example, how distributed SQL engines are arranged, and what approaches they use to increase query performance.

During this session we'll see:

  • architecture of a distributed relational operators, like aggregate, sort, join;
  • partitioning data in a cluster to minimize data transfer between nodes;
  • use of cost-based optimizers to find optimal execution plans;
  • splitting complex plans into independent fragments, and organizing data transfer between them;
  • advanced techniques: compilation, vectorization, pruning.
  • #queryoptimization
  • #queryengine
  • #tooling