Let's Ditch Java to Massively Accelerate Spark/Presto... Or Not Yet?

Database Internals

In RussianComplexity -

In this talk we will take a look at the Velox C++ library, which is a good prospect for accelerating Presto, Spark and other analytical data processing systems and DMBSes. There will be a review of some the most mature integrations with existing popular OLAP/ML systems.

The first part tells a brief history of the Deconstructed Database concept (modular, pluggable DBMS components) and Velox's place within it.

In the second part we will talk about Native Execution in general, its advantages and why it is beneficial (or even necessary) to vectorize computations, especially for analytical workloads.

Then, we will dive into some of the problems, which the Meta* developers have encountered while optimizing their internal infrastructure, and how they have managed to solve these problems, namely: writing the Velox C++ library to accelerate various workloads within the company's infrastructure.

The last part is devoted to the review of some of the Velox's integrations with popular data processing systems, such as Presto, Spark and PyTorch, as well, as the current status of these integrations. Also, we will take a look at some of the benchmarks provided by the corresponding developers.

Target audience: DBMS and query execution engine developers, Data Engineers.

* Meta is a prohibited organization on the territory of the Russian Federation.