Skip to contentRU

If you have a ticket, log in to watch the video

Talk

Database Internals

Date: 13.09 / Start: 00:00 – Finish: 00:00

How We Adapted Dynamic YTsaurus Tables to Store Blobs

In RussianComplexity -

Presentation pdf

To improve the efficiency of YTsaurus, the team decided to remove blobs and store them separately from "normal" tabular data. They had to modify compaction algorithms in a special way to be able to collect "garbage" among the blocks and to provide a suitable tradoff between the disk space (space amplification) and the amount of permanently overwritten data (write amplification). They also took an approach to a number of tables, which were kept in RAM. As a result, we moved (under the guise of blobs!) some of their data to disks and reduced RAM consumption by several times, while maintaining low read times at high quantiles. In the process of implementation, the IO-stack had to be significantly improved by switching to io_uring, and the block-storage layer by adding a consistent hashing algorithm to choose the method of data replicas arrangement.

#ytsaurus
#blob
#storage

Speakers

Maksim Babenko
Yandex

Invited experts

Vladimir Sitnikov
PostgreSQL JDBC committer

Other talks on «Database Internals»
- Watch recording
  Speed-up queries: How to Cook ClickHouse Well-done
  Kuzma Leshakov
  Yandex Cloud
  Room 1In RussianComplexity -
- Watch recording
  ACID Transactions in Apache Cassandra 5.0
  Aleksandr Volochnev
  Datastax
  In RussianComplexity -
- Watch recording
  Compression, encryption and more: changing the behavior and guarantees of a distributed database
  Anton Vinogradov
  Apache Software Foundation
  Room 1In RussianComplexity -
- Watch recording
  Scheduling a Billion of Tasks per Day
  Ignat Kolesnichenko
  YTsaurus
  In RussianComplexity -
- Watch recording
  Moving Towards Universality: A Hybrid OLTP Database with OLAP Query Support
  Aleksei Dmitriev
  Yandex
  Room 2In RussianComplexity -
- Watch recording
  Fast data processing in Data Lake with Trino
  Vladimir Ozerov
  Querify Labs
  Room 1In RussianComplexity -
- Watch recording
  A distributed SQL query engine for data analytics
  Alexey Ozeritskii
  Yandex
  Room 2In RussianComplexity -
- Watch recording
  Predictive Analysis of Parasitic Load on GreenPlum Clusters
  Mark Lebedev
  GlowByte Consulting
  Pavel Ternyuk
  Data Sapience
  Room 2In RussianComplexity -
- Watch recording
  Application of TLA+ for Efficient Testing of Distributed Systems
  Nikita Siniachenko
  VK
  Evgenii Chernatskiy
  VK
  Room 3In RussianComplexity -
- Watch recording
  What it takes to achieve linearizability in a distributed system
  Sergey Petrenko
  Tarantool
  Room 3In RussianComplexity -
- Watch recording
  Deep Dive Into Query Performance
  Peter Zaitsev
  Percona
  In RussianComplexity -