Talks

The program hasn’t been finally approved yet, so there still might be some changes.

Talk
What Metastore Is
What metastore is, how it works in the big data ecosystem, what solutions exist on the market and why we decided to develop our own. I will share practical experience, architecture and lessons we have learned.
- Mikhail Ivanov
  Positive Technologies
In Russian
Talk
Spark Connect: A New Approach to Working with Apache Spark
I will tell you about Spark Connect — a new approach to working with Apache Spark, which allows you to develop the client part of the application in any language and not depend on the JVM. We will talk about the architecture of Spark Connect and its differences from classic Spark. You will learn about a project where we used Spark Connect API for C++.
- Aleksandr Tokarev
  Yandex
In Russian
Talk
DataRentgen: What’s Wrong With OSS Data Catalog and How To Make It Better
Description of the path of developing an open source data lineage solution based on OpenLineage + Kafka + FastStream + FastAPI. Comparison with other open source solutions (OpenMetadata, DataHub, Marquez, OpenAtlas) and why we abandoned them in favor of our own development. No, this is not another custom Data Catalog :)
- Maxim Martynov
  МТС Web Services (MWS)
In Russian
Talk
StarRocks: the Reality of the Modern Data Platform
The data platform in our company has existed for more than 5 years, during this time it has absorbed a lot of trendy (and not so trendy) solutions. I will tell you how we tried to choose our future among ClickHouse, Greenplum and Trino, and found StarRocks.
- Stanislav Lysikov
In Russian
Talk
How Challenging Times Forced Us To Build Better BI
How we at T-Bank built our BI tool on Apache Superset, rebuilt our BI culture, made synergies between BI analysts and developers of our BI tool and successfully migrated from Tableau.
- Ekaterina Shcherbakova
  T-Bank
In Russian
Talk
Ways To Organize CDC in PostgreSQL and Why Debezium out of the Box Will Not Solve All Problems
Getting change events from sources is quite a common task that can be solved in different ways. One of such solutions is Debezium. But is it so simple and is it always the best solution? I will try to answer these questions and consider Debezium from the point of view of the difficulties that arise on the way of solving the task of change capture.
- Nikita Rianov
In Russian
Talk
Third Party Runtime Engines for Apache Spark: Experience of Using
Experience of using Comet and Gluten (Velox) execution engines – from the introduction and features of the build to the results of testing on real ETLs. I will tell you about pitfalls and non-obvious points, show the results of work and consider cases when these engines are useful and when they don't work at all.
- Nikita Blagodarnyi
  Chestnyj znak
In Russian
Talk
Hadoop Is Not Dead — Just Secure!
The story of how a small team of engineers implemented Hadoop with full Kerberos and Ranger-based security without stopping business processes.
- Antony Aleksandrov
  Detsky Mir
In Russian
Talk
Vector Search Algorithms in Modern Databases
A detailed review of existing vector search algorithms, the most popular in modern database management systems.
- Alexander Zevaykin
  YDB
In Russian
Talk
Vector Search Algorithms in YDB
YDB has undergone a significant development path from applying basic vector search techniques to creating a scalable and efficient vector index. The talk presents a detailed analysis of the stages of evolution of vector search in YDB, including analysis of complexities and engineering solutions.
- Alexander Zevaykin
  YDB
In Russian
Talk
How We Improved Data Management Processes in Airflow: Practical Cases
I'll tell you how we use Airflow in practice: from the pain of sensors to the convenience of datasets, from static DAGs with a bunch of files to dynamic ones, and from standard features to our own custom solutions that will not leave those who are faced with the actual operation of Airflow indifferent.
- Dmitrii Morozov
  Innovation Center "Safe Transport"
In Russian
Talk
How Yandex Market Storage Started Writing Documentation for Objects
How Yandex Market started writing documentation. You will learn how it happened and what problems the company faced. We will consider different approaches to describing metadata in storages, compare them with each other and understand whether it is worth going down this path.
- Pavel Kolodkin
  Yandex Market
In Russian
Talk
Apache Spark SQL. Extend and Manage
How to configure and modify Apache Spark for your tasks without rewriting the framework. I will tell you about approaches to expanding the functionality of Spark SQL without interfering with the platform's source code. You will learn about creating your own data sources, developing user functions for specialized processing, and implementing optimization rules that adapt to various requests.
- Dmitrii Vertlib
  Chestnyj znak
In Russian

We will add more talks soon.

We are actively adding to the program. Sign up for our newsletter to stay informed.