Schedule

The time in the program is for your time zone .

Full program published

All talks are now available — you can plan your visit.

The program may change — subscribe for updates.

Download schedule

September 23
- TalkStart: 00:00 – Finish: 00:00
  How to Teach LLM to Work with Data Instead of Just Writing Plausible SQL
  How to teach an LLM not just to write plausible SQL, but to actually work with corporate data: find the right sources, understand metrics, write ETL, and validate your own answers.
  - Maksim Statsenko
    Yandex
  In Russian
  - AI агенты
- TalkStart: 00:00 – Finish: 00:00
  What Happens Between SELECT and DATA
  Let's explore what actually happens between a request and a result.
  - Petr Gurinov
    Yandex Cloud
  In Russian
  - Базы данных
- TalkStart: 00:00 – Finish: 00:00
  How Platformization and AI Are Changing the Analytics Development Lifecycle: T-Bank’s Experience
  A talk about why a collection of fragmented tools stops working at the scale of a large Data Platform, and why the platform should be viewed as a unified ADLC rather than a set of separate services. I will show how this affects ETL, ad hoc development, Data Governance, Data Quality, and metrics, and why AI and the agent-based approach are becoming the main drivers of new platform requirements.
  - Dmitrii Rudnev
    T-Bank
  In Russian
  - DG + DQ
- TalkStart: 00:00 – Finish: 00:00
  LLM Under Load: How to Measure the Performance of Self-Hosted Models
  In this talk, I will analyze a practical approach to measuring self-hosted LLM performance.
  - Roman Peskov
    Cian
  In Russian
  - ML/LLMOps
- TalkStart: 00:00 – Finish: 00:00
  Kafka News: KRaft, Queus, Tiered Storage (And a Bit About YDB)
  Parallel reading from Kafka topics, KRaft, server balancing, and tiered storage.
  - Andrey Serebryanskiy
    Yandex Cloud
  In Russian
  - DE/ ETL
- TalkStart: 00:00 – Finish: 00:00
  S3, HDFS, POSIX... or All of the Above? Building a Data Lake the Chinese Way with CubeFS
  We'll dive into the architectural decisions behind CubeFS that make it possible to build exabyte-scale storage for ML and analytics workloads. Topics include its high-performance, horizontally scalable metadata service, local and distributed caching, transparent data movement across storage tiers, and other key design features.
  - Ivan Arkhipov
  In Russian
  - Платформы и Системы хранения
- TalkStart: 00:00 – Finish: 00:00
  State of Iceberg REST Catalogs: What We're Missing and How to Make a DIY Control Plane
  Let's talk about what important functions are needed to manage Iceberg tables and the role of REST Catalog in this.
  - Vitaliy Moiseev
    Ostrovok!
  In Russian
  - Платформы и Системы хранения
- TalkStart: 00:00 – Finish: 00:00
  Is There Life After dbt?
  In the talk, I will review the current state of the data transformation ecosystem, as well as alternative tools and promising projects that may replace dbt.
  - Alexandra Popova
    Positive Technologies
  In Russian
  - DE/ ETL
- TalkStart: 00:00 – Finish: 00:00
  PostgreSQL Performance Diagnostics, or the Detective Called "Something's Slowing Down the Database"
  The talk focuses on practical PostgreSQL performance diagnostics for backend developers who maintain their databases independently and do not have a dedicated DBA.
  - Stepan Fomichev
    Yandex Cloud
  In Russian
  - Базы данных
- TalkStart: 00:00 – Finish: 00:00
  LLM Ops: Optimization of Inference and ML-serving in a Real Production Cluster
  The talk is about practical experience in optimizing inference and ML-serving based on GPUStack in the production environment of the corporate AI Portal.
  - Dmitry Ibragimov
    Lemana Tech
  In Russian
  - ML/LLMOps
- TalkStart: 00:00 – Finish: 00:00
  Data Marts on Data Lakehouse: A Major Migration from Greenplum 6
  We are going to discuss the real experience of migrating data marts from a monolithic solution based on Greenplum 6 to the Data Lakehouse stack, paying attention to how to make this process the least painful for users. You will learn what non-obvious problems you will have to face and how to build processes so that the new architecture is more efficient than the legacy solution, rather than its less productive copy.
  - Artemii Naumov
    Lemana Tech
  In Russian
  - Платформы и Системы хранения
- TalkStart: 00:00 – Finish: 00:00
  Knowledge Graph as an Infrastructure for AI Agents: From Datasets to a Single Graph
  I will tell you how we built a single knowledge graph on top of dozens of disparate corporate datasets — an infrastructure where an AI agent doesn't guess an answer based on similar chunks, but consciously navigates the structure and relationships of data.
  - Aleksandr Nepochatykh
    Sber
  In Russian
  - ML/LLMOps
- TalkStart: 00:00 – Finish: 00:00
  Vector Search in PostgreSQL: pgvector Under the Hood
  How pgvector works: vector storage, HNSW and IVFFlat algorithms, performance degradation points. An honest breakdown of where the solution holds up and where it doesn't.
  - Daria Barsukova
    Postgres Pro
  In Russian
  - Базы данных
- TalkStart: 00:00 – Finish: 00:00
  Datapipe — Data Transformation Using K8s and S3
  How we learned to use Python, K8s, and S3 to efficiently count data in the cloud.
  - Sergey Zakharchenko
    EPOCH8
  In Russian
  - DE/ ETL
- TalkStart: 00:00 – Finish: 00:00
  Buying More or Keep On Using: How an AI Assistant Became Ready for Production
  I'll show you how to count memory and KV-cache, how inference layer solutions change the load profile, and then we'll move on to our implementation in Deckhouse.
  - Alexander Podmoskovniy
    Flant
  In Russian
  - ML/LLMOps
- TalkStart: 00:00 – Finish: 00:00
  Migration of Data Management Tools to OMD at Magnit
  I will tell you how we built the Magnit Data ecosystem, where the catalog, glossary, DQ engine, dashboards, and chatbot work as one mechanism.
  - Oleg Molchanov
    Magnit
  In Russian
  - DG + DQ
- Start: 00:00 – Finish: 00:00
  Networking and Afterparty
September 24
- TalkStart: 00:00 – Finish: 00:00
  Postmortem Comparisons of Agentic and Classical AutoML: Typical Pitfalls of the Agentic Approach
  I will analyze the components of success and failure and provide a practical checklist that will help you quickly decide whether you need an agent or a classic AutoML model to generate a baseline model.
  - Valeriia Dymbitskaia
    Upgini
  In Russian
  - AI агенты
- TalkStart: 00:00 – Finish: 00:00
  Data Streaming Lakehouse: How to Stream Data Into Paimon and Not Drown
  The talk focuses on practical experience in building a Data Streaming Lakehouse for near-real-time analytics using a stack of MySQL, Flink, Paimon, HDFS, and StarRocks.
  - Kirill Romanikhin
    Place.01
  In Russian
  - Платформы и Системы хранения
- TalkStart: 00:00 – Finish: 00:00
  The MDM That Stores Nothing: How to Match Data Without Centralizing It
  A classic MDM system often assumes that data needs to be brought together in one place: loaded, normalized, matched, assigned a golden record, and then managed centrally as master data. But what do you do when, due to security or regulatory requirements, the system is not allowed to store data within its own perimeter?
  - Iurii Goryntsev
    Arenadata Catalog
  In Russian
  - DG + DQ
- TalkStart: 00:00 – Finish: 00:00
  Reading Faster Than Ceph Can Serve: How We Built S3 Sharding With No Extra Infrastructure
  Our Trino storage hit the performance ceiling of a single Ceph cluster — so we started spreading every table across several clusters at once, hiding all the sharding logic in the HAProxy sidecars on our compute nodes, without adding a single new component to the architecture. Reads sped up from 20 to 60–80 GB/s, and GET latency dropped from minutes to 1–2 seconds.
  - Dmitrii Listvin
    Avito
  In Russian
  - Платформы и Системы хранения
- TalkStart: 00:00 – Finish: 00:00
  From Text-to-SQL to Trusted Analytics: Building an On-Prem Semantic Layer for AI Agents
  LLM agents confidently hallucinate in business reports, and the accuracy of Text-to-SQL is clearly insufficient for regulatory and management reporting. I will show you how a semantic layer based on MetricFlow can increase accuracy to 90% or higher, and how to deploy this solution on-prem to ensure that your reports can be trusted.
  - Igor Dmitriev
    Independent expert
  In Russian
  - AI агенты
- TalkStart: 00:00 – Finish: 00:00
  Transactions in PostgreSQL: Parallelizing Non-Parallelizable
  I will explain how we implemented an atomic commit of distributed transactions at the PostgreSQL core level, based on the processing of 2PC/XA mechanisms, and show the results of its testing.
  - Daniil Davydov
    Postgres Professional
  In Russian
  - Базы данных
- TalkStart: 00:00 – Finish: 00:00
  YTsaurus in the Wild: Pros, Cons, and Pitfalls
  I will tell you about the experience of implementing and using YTsaurus in Chestny Znak.
  - Nikita Blagodarnyi
    Chestny znak
  In Russian
  - Базы данных
- TalkStart: 00:00 – Finish: 00:00
  Sketches: Useful in Practice or Just Amazing Mathematics?
  The talk shows pitfalls that prevent the widespread use of sketches by final analysts.
  - René van Bevern
  In Russian
  - DE/ ETL
- TalkStart: 00:00 – Finish: 00:00
  Why the Future of AI Is "Vectorless" and How We Tested It with the Operator Assistant
  How Vectorless helps you deal with the problem of losing data hierarchy.
  - Andrey Nosov
    Raft
  In Russian
  - AI агенты
- TalkStart: 00:00 – Finish: 00:00
  Pimp My Ride: Adapt Your Old Retriever to New Challenges
  Let's explore how to create a good semantic search engine.
  - Vladislav Popov
    Tochka Bank
  In Russian
  - ML/LLMOps
- TalkStart: 00:00 – Finish: 00:00
  How to Search for a Memory Leak in the Storage for a Month and Find Out That It Actually Does Not Exist
  Let's talk about testing, finding and debugging problems in highly loaded software, as well as support for storage with third-party vendor solutions.
  - Mikhail Motylenok
    YADRO
  In Russian
  - Базы данных
- TalkStart: 00:00 – Finish: 00:00
  NiFi Threads Review and Deploy Via Git
  I will tell you about implementing a review and deployment process for NiFi threads in a team with many developers, where changes to the threads are made several times a day.
  - Klavdia Popova
    Sibur Digital
  In Russian
  - DE/ ETL
- TalkStart: 00:00 – Finish: 00:00
  ML Against Hackers. Processing Hundreds of Thousands of Events Per Second
  We will show how we built a scalable ML platform for detecting hackers using open-source tools (Airflow, Trino, Iceberg, and MLflow).
  - Nikolai Lyfenko
    Positive Technologies
  In Russian
  - ML/LLMOps
- TalkStart: 00:00 – Finish: 00:00
  Data Contracts: When a Schema Becomes a Contract
  On the production pipeline, we will show how one merge triggers validation, compatibility checks, ingestion generation, data publishing, and catalog updates.
  - Nikita Borzunov
    Uzum Market
  In Russian
  - DG + DQ
- TalkStart: 00:00 – Finish: 00:00
  Metric Store as a Boost for AI
  Our experience of building a Metric Store.
  - Dmitriy Shirokov
    Yandex Taxi
  In Russian
  - AI агенты

Schedule

Full program published

Full program published

September 23

Networking and Afterparty

September 24