Schedule

The time in the program is for your time zone .

Full program published

All talks are now available — you can plan your visit.

The program may change — subscribe for updates.

Download schedule

September 23
00:00
Conversation
Opening of SmartData 2026
We'll talk about the schedule, sessions and share information.
- SmartData Program Committee
In Russian
00:00
Talk
How to Teach LLM to Work with Data Instead of Just Writing Plausible SQL
How to teach an LLM not just to write plausible SQL, but to actually work with corporate data: find the right sources, understand metrics, write ETL, and validate your own answers.
- Maksim Statsenko
  Yandex
In Russian
- AI агенты
Talk
What Happens Between SELECT and DATA
Let's explore what actually happens between a request and a result.
- Petr Gurinov
  Yandex Cloud
In Russian
- Базы данных
Talk
How Platformization and AI Are Changing the Analytics Development Lifecycle: T-Bank’s Experience
A talk about why a collection of fragmented tools stops working at the scale of a large Data Platform, and why the platform should be viewed as a unified ADLC rather than a set of separate services. I will show how this affects ETL, ad hoc development, Data Governance, Data Quality, and metrics, and why AI and the agent-based approach are becoming the main drivers of new platform requirements.
- Dmitrii Rudnev
  T-Bank
In Russian
- DG + DQ
00:00
Talk
LLM Under Load: How to Measure the Performance of Self-Hosted Models
In this talk, I will analyze a practical approach to measuring self-hosted LLM performance.
- Roman Peskov
  Cian
In Russian
- ML/LLMOps
Talk
Kafka News: KRaft, Queus, Tiered Storage (And a Bit About YDB)
Parallel reading from Kafka topics, KRaft, server balancing, and tiered storage.
- Andrey Serebryanskiy
  Yandex Cloud
In Russian
- DE/ ETL
Talk
S3, HDFS, POSIX... or All of the Above? Building a Data Lake the Chinese Way with CubeFS
We'll dive into the architectural decisions behind CubeFS that make it possible to build exabyte-scale storage for ML and analytics workloads. Topics include its high-performance, horizontally scalable metadata service, local and distributed caching, transparent data movement across storage tiers, and other key design features.
- Ivan Arkhipov
In Russian
- Платформы и Системы хранения
00:00
Talk
State of Iceberg REST Catalogs: What We're Missing and How to Make a DIY Control Plane
Let's talk about what important functions are needed to manage Iceberg tables and the role of REST Catalog in this.
- Vitaliy Moiseev
  Ostrovok!
In Russian
- Платформы и Системы хранения
Talk
Is There Life After dbt?
In the talk, I will review the current state of the data transformation ecosystem, as well as alternative tools and promising projects that may replace dbt.
- Alexandra Popova
  Positive Technologies
In Russian
- DE/ ETL
Talk
PostgreSQL Performance Diagnostics, or the Detective Called "Something's Slowing Down the Database"
The talk focuses on practical PostgreSQL performance diagnostics for backend developers who maintain their databases independently and do not have a dedicated DBA.
- Stepan Fomichev
  Yandex Cloud
In Russian
- Базы данных
00:00
Talk
LLM Ops: Optimization of Inference and ML-serving in a Real Production Cluster
The talk is about practical experience in optimizing inference and ML-serving based on GPUStack in the production environment of the corporate AI Portal.
- Dmitry Ibragimov
  Lemana Tech
In Russian
- ML/LLMOps
Talk
Data Marts on Data Lakehouse: A Major Migration from Greenplum 6
We are going to discuss the real experience of migrating data marts from a monolithic solution based on Greenplum 6 to the Data Lakehouse stack, paying attention to how to make this process the least painful for users. You will learn what non-obvious problems you will have to face and how to build processes so that the new architecture is more efficient than the legacy solution, rather than its less productive copy.
- Artemii Naumov
  Lemana Tech
In Russian
- Платформы и Системы хранения
00:00
Talk
Knowledge Graph as an Infrastructure for AI Agents: From Datasets to a Single Graph
I will tell you how we built a single knowledge graph on top of dozens of disparate corporate datasets — an infrastructure where an AI agent doesn't guess an answer based on similar chunks, but consciously navigates the structure and relationships of data.
- Aleksandr Nepochatykh
  Sber
In Russian
- ML/LLMOps
Talk
Vector Search in PostgreSQL: pgvector Under the Hood
How pgvector works: vector storage, HNSW and IVFFlat algorithms, performance degradation points. An honest breakdown of where the solution holds up and where it doesn't.
- Daria Barsukova
  Postgres Pro
In Russian
- Базы данных
Talk
Datapipe — Data Transformation Using K8s and S3
How we learned to use Python, K8s, and S3 to efficiently count data in the cloud.
- Sergey Zakharchenko
  EPOCH8
In Russian
- DE/ ETL
00:00
Talk
Buying More or Keep On Using: How an AI Assistant Became Ready for Production
I'll show you how to count memory and KV-cache, how inference layer solutions change the load profile, and then we'll move on to our implementation in Deckhouse.
- Alexander Podmoskovniy
  Flant
In Russian
- ML/LLMOps
Talk
Migration of Data Management Tools to OMD at Magnit
I will tell you how we built the Magnit Data ecosystem, where the catalog, glossary, DQ engine, dashboards, and chatbot work as one mechanism.
- Oleg Molchanov
  Magnit
In Russian
- DG + DQ
00:0000:0000:0000:0000:0000:00
Start: 00:00 – Finish: 00:00
Networking and Afterparty
September 24
00:00
Talk
Postmortem Comparisons of Agentic and Classical AutoML: Typical Pitfalls of the Agentic Approach
I will analyze the components of success and failure and provide a practical checklist that will help you quickly decide whether you need an agent or a classic AutoML model to generate a baseline model.
- Valeriia Dymbitskaia
  Upgini
In Russian
- AI агенты
Talk
The MDM That Stores Nothing: How to Match Data Without Centralizing It
A classic MDM system often assumes that data needs to be brought together in one place: loaded, normalized, matched, assigned a golden record, and then managed centrally as master data. But what do you do when, due to security or regulatory requirements, the system is not allowed to store data within its own perimeter?
- Iurii Goryntsev
  Arenadata Catalog
In Russian
- DG + DQ
00:00
Talk
Reading Faster Than Ceph Can Serve: How We Built S3 Sharding With No Extra Infrastructure
Our Trino storage hit the performance ceiling of a single Ceph cluster — so we started spreading every table across several clusters at once, hiding all the sharding logic in the HAProxy sidecars on our compute nodes, without adding a single new component to the architecture. Reads sped up from 20 to 60–80 GB/s, and GET latency dropped from minutes to 1–2 seconds.
- Dmitrii Listvin
  Avito
In Russian
- Платформы и Системы хранения
Talk
From Text-to-SQL to Trusted Analytics: Building an On-Prem Semantic Layer for AI Agents
LLM agents confidently hallucinate in business reports, and the accuracy of Text-to-SQL is clearly insufficient for regulatory and management reporting. I will show you how a semantic layer based on MetricFlow can increase accuracy to 90% or higher, and how to deploy this solution on-prem to ensure that your reports can be trusted.
- Igor Dmitriev
  Independent expert
In Russian
- AI агенты
Talk
Transactions in PostgreSQL: Parallelizing Non-Parallelizable
I will explain how we implemented an atomic commit of distributed transactions at the PostgreSQL core level, based on the processing of 2PC/XA mechanisms, and show the results of its testing.
- Daniil Davydov
  Postgres Professional
In Russian
- Базы данных
00:00
Talk
YTsaurus in the Wild: Pros, Cons, and Pitfalls
I will tell you about the experience of implementing and using YTsaurus in Chestny Znak.
- Nikita Blagodarnyi
  Chestny znak
In Russian
- Базы данных
Talk
Sketches: Useful in Practice or Just Amazing Mathematics?
The talk shows pitfalls that prevent the widespread use of sketches by final analysts.
- René van Bevern
In Russian
- DE/ ETL
Talk
Why the Future of AI Is "Vectorless" and How We Tested It with the Operator Assistant
How Vectorless helps you deal with the problem of losing data hierarchy.
- Andrey Nosov
  Raft
In Russian
- AI агенты
00:00
Talk
Pimp My Ride: Adapt Your Old Retriever to New Challenges
Let's explore how to create a good semantic search engine.
- Vladislav Popov
  Tochka Bank
In Russian
- ML/LLMOps
Talk
How to Search for a Memory Leak in the Storage for a Month and Find Out That It Actually Does Not Exist
Let's talk about testing, finding and debugging problems in highly loaded software, as well as support for storage with third-party vendor solutions.
- Mikhail Motylenok
  YADRO
In Russian
- Базы данных
Talk
NiFi Threads Review and Deploy Via Git
I will tell you about implementing a review and deployment process for NiFi threads in a team with many developers, where changes to the threads are made several times a day.
- Klavdia Popova
  Sibur Digital
In Russian
- DE/ ETL
00:00
Talk
ML Against Hackers. Processing Hundreds of Thousands of Events Per Second
We will show how we built a scalable ML platform for detecting hackers using open-source tools (Airflow, Trino, Iceberg, and MLflow).
- Nikolai Lyfenko
  Positive Technologies
In Russian
- ML/LLMOps
Talk
Data Contracts: When a Schema Becomes a Contract
On the production pipeline, we will show how one merge triggers validation, compatibility checks, ingestion generation, data publishing, and catalog updates.
- Nikita Borzunov
  Uzum Market
In Russian
- DG + DQ
Talk
Metric Store as a Boost for AI
Our experience of building a Metric Store.
- Dmitriy Shirokov
  Yandex Taxi
In Russian
- AI агенты
00:00
Conversation
Closing Ceremony of SmartData 2026
Summing up the results of the conference, remembering the highlights and talking about plans.
- SmartData Program Committee
In Russian
00:0000:0000:0000:0000:0000:00

Schedule

Full program published

Full program published

September 23

Networking and Afterparty

September 24