Start of main content
Talk type: Talk
A distributed SQL query engine for data analytics
The YQL service provides access to data storage and processing systems using the SQL dialect. Initially, SQL queries were executed using Map/Reduce operations in the YTSaurus system.
This simple and robust scheme suffers from a number of shortcomings that have led to YQL's own query execution engine.
The engine divides the query into stages, and each stage into tasks. Each task runs on a cluster node. Tasks transmit the results of calculations to each other over the network. This approach differs from the Map/Reduce approach, in which data is transferred between stages by writing to disk. Of the interesting features of the engine, it is worth noting cross-cluster queries (for example, you can make queries in which tables from ClickHouse and YTSaurus clusters are simultaneously present), the ability to execute user defined functions in various programming languages.
The engine is available as an open source library as part of the YDB Platform project. The library provides primitives for working with the AST query tree, computational primitives, as well as a set of microservices for running and managing tasks on the cluster. At the moment, the library works within three installations: an internal YQL service, YDB in Yandex Cloud and Yandex Query in Yandex Cloud.