Skip to contentRU

If you have a ticket, log in to watch the video

Talk

Database Internals

Date: 13.09 / Start: 00:00 – Finish: 00:00

A distributed SQL query engine for data analytics

In RussianComplexity -

Presentation pdf

The YQL service provides access to data storage and processing systems using the SQL dialect. Initially, SQL queries were executed using Map/Reduce operations in the YTSaurus system.

This simple and robust scheme suffers from a number of shortcomings that have led to YQL's own query execution engine.

The engine divides the query into stages, and each stage into tasks. Each task runs on a cluster node. Tasks transmit the results of calculations to each other over the network. This approach differs from the Map/Reduce approach, in which data is transferred between stages by writing to disk. Of the interesting features of the engine, it is worth noting cross-cluster queries (for example, you can make queries in which tables from ClickHouse and YTSaurus clusters are simultaneously present), the ability to execute user defined functions in various programming languages.

The engine is available as an open source library as part of the YDB Platform project. The library provides primitives for working with the AST query tree, computational primitives, as well as a set of microservices for running and managing tasks on the cluster. At the moment, the library works within three installations: an internal YQL service, YDB in Yandex Cloud and Yandex Query in Yandex Cloud.

#ytsaurus
#query_engine
#architecture

Speakers

Alexey Ozeritskii
Yandex

Invited experts

Aleksei Dmitriev
Yandex

Other talks on «Database Internals»
- Watch recording
  Speed-up queries: How to Cook ClickHouse Well-done
  Kuzma Leshakov
  Yandex Cloud
  Room 1In RussianComplexity -
- Watch recording
  ACID Transactions in Apache Cassandra 5.0
  Aleksandr Volochnev
  Datastax
  In RussianComplexity -
- Watch recording
  How We Adapted Dynamic YTsaurus Tables to Store Blobs
  Maksim Babenko
  Yandex
  Room 1In RussianComplexity -
- Watch recording
  Compression, encryption and more: changing the behavior and guarantees of a distributed database
  Anton Vinogradov
  Apache Software Foundation
  Room 1In RussianComplexity -
- Watch recording
  Scheduling a Billion of Tasks per Day
  Ignat Kolesnichenko
  YTsaurus
  In RussianComplexity -
- Watch recording
  Moving Towards Universality: A Hybrid OLTP Database with OLAP Query Support
  Aleksei Dmitriev
  Yandex
  Room 2In RussianComplexity -
- Watch recording
  Fast data processing in Data Lake with Trino
  Vladimir Ozerov
  Querify Labs
  Room 1In RussianComplexity -
- Watch recording
  Predictive Analysis of Parasitic Load on GreenPlum Clusters
  Mark Lebedev
  GlowByte Consulting
  Pavel Ternyuk
  Data Sapience
  Room 2In RussianComplexity -
- Watch recording
  Application of TLA+ for Efficient Testing of Distributed Systems
  Nikita Siniachenko
  VK
  Evgenii Chernatskiy
  VK
  Room 3In RussianComplexity -
- Watch recording
  What it takes to achieve linearizability in a distributed system
  Sergey Petrenko
  Tarantool
  Room 3In RussianComplexity -
- Watch recording
  Deep Dive Into Query Performance
  Peter Zaitsev
  Percona
  In RussianComplexity -