Skip to content
SmartData 2025
  • Schedule
  • Speakers
  • Media
  • Partners
  • About
  • Archive
  • Experts
  • Code of Conduct
  • Participation rules
    • Become a speaker
    • Become a speaker
    • Schedule
    • Speakers
    • Media
    • Partners
    • About
    • Archive
    • Experts
    • Code of Conduct
    • Participation rules
    RU

    Schedule

    • Schedule
    • Favorites
    • Data ToolsIn total8
    • Data ManagementIn total7
    • Architecture of Data PlatformsIn total7
    • Use CasesIn total4
    • AI/LLM in DataIn total4
    • Database InternalsIn total3
    • DQIn total3
    • MPPIn total1
    • Art&ScienceIn total1
    • Off TopicIn total4
    Download schedule
    • date
    • topics
    • Data Tools

      8
      • Watch recording

        Apache Iceberg Development Prospects

        We will discuss the key challenges that Apache Iceberg is facing, as well as the prospects for technology development.

        • Vladimir Ozerov

          CedrusData

        Hall 2In RussianRUComplexity -For practicing engineers
      • Watch recording

        Spark is Done!

        Let's talk about Spark. What did it give data engineers? Why do many of us use it?

        Spark has been around for over 15 years. What problems do we face when using it? Is there anything better? Is it already possible to replace it with something?

        Why is %SQLEngineName% slowing down? How can one fix this? Benchmarks, open source, and the like.

        • Evgenii Glotov

          Navio

        Hall 2In RussianRUComplexity -For practicing engineers
      • Watch recording

        GP2S3 in a Serious Way

        We upload hundreds of terabytes from Greenplum to S3 every day. You can learn about the pitfalls we have collected and what happened in the end.

        • Vladimir Ermakov

          T-Bank

        • Andrei Koshkin

          T-Bank

        Hall 2In RussianRUComplexity -For practicing engineers
      • Watch recording

        Spark Connect: A New Approach to Working with Apache Spark

        I will tell you about Spark Connect — a new approach to working with Apache Spark, which allows you to develop the client part of the application in any language and not depend on the JVM. We will talk about the architecture of Spark Connect and its differences from classic Spark. You will learn about a project where we used Spark Connect API for C++.

        • Aleksandr Tokarev

          Yandex

        Hall 1In RussianRUComplexity -For practicing engineers
      • Watch recording

        Debezium and PostgreSQL After Happy-Path: What Problems Await in Production and How To Solve Them

        Getting change events from sources is quite a common task that can be solved in different ways. One of such solutions is Debezium. But is it so simple and is it always the best solution? I will try to answer these questions and consider Debezium from the point of view of the difficulties that arise on the way of solving the task of change capture.

        • Nikita Rianov

        Hall 2In RussianRUComplexity -For practicing engineers
      • Watch recording

        StarRocks: the Reality of the Modern Data Platform

        The data platform in our company has existed for more than 5 years, during this time it has absorbed a lot of trendy (and not so trendy) solutions. I will tell you how we tried to choose our future among ClickHouse, Greenplum and Trino, and found StarRocks. 

        • Stanislav Lysikov

        Hall 1In RussianRUComplexity -Introduction to technology
      • Watch recording

        Third Party Runtime Engines for Apache Spark: Experience of Using

        Experience of using Comet and Gluten (Velox) execution engines – from the introduction and features of the build to the results of testing on real ETLs. I will tell you about pitfalls and non-obvious points, show the results of work and consider cases when these engines are useful and when they don't work at all.

        • Nikita Blagodarnyi

          Chestnyj znak

        Hall 1In RussianRUComplexity -For practicing engineers
      • Watch recording

        Apache Spark SQL. Extend and Manage

        How to configure and modify Apache Spark for your tasks without rewriting the framework. I will tell you about approaches to expanding the functionality of Spark SQL without interfering with the platform's source code. You will learn about creating your own data sources, developing user functions for specialized processing, and implementing optimization rules that adapt to various requests.

        • Dmitrii Vertlib

          Chestnyj znak

        Hall 1In RussianRUComplexity -For practicing engineers
    • Data Management

      7
      • Watch recording

        DWH Monitoring: From Metadata to DataOps

        A practical case study of implementing DWH monitoring from Skyeng: from metadata architecture to automated data quality checks and transition to DataOps practices.

        • Danil Zakharov

          Skyeng

        Hall 3In RussianRUComplexity -For practicing engineers
      • Watch recording

        DataRentgen: How To Build Yet Another Lineage Without Attracting the Attention of Orderlies

        Description of the path of developing an open source data lineage solution based on OpenLineage. Comparison with other open source solutions — OpenMetadata, DataHub, Marquez — and the reason we abandoned them in favor of our own development. No, this is not another custom Data Catalog :)

        • Maxim Martynov

          MTS Web Services (MWS)

        Hall 2In RussianRUComplexity -For practicing engineers
      • Watch recording

        How Yandex Market Storage Started Writing Documentation for Objects

        How Yandex Market started writing documentation. You will learn how it happened and what problems the company faced. We will consider different approaches to describing metadata in storages, compare them with each other and understand whether it is worth going down this path.

        • Pavel Kolodkin

          Yandex Market

        Hall 2In RussianRUComplexity -Introduction to technology
      • Watch recording

        Good Data Doesn’t Happen by Accident

        Good data doesn’t happen by accident. I’ll share my experience building a tool that helps validate data automatically — fast, flexible, and pain-free.

        • Iurii Goryntsev

          Arenadata Catalog

        Hall 2In RussianRUComplexity -For practicing engineers
      • Watch recording

        Data Catalog: Metadata Distortion or a Product Approach

        Approaches to uploading metadata to the Data Catalog are often considered in a linear way: a minimum of changes, maximum preservation of the "truth". But is this really the right thing to do?

        • Anna Mavliutova

          T-Bank

        Hall 3In RussianRUComplexity -For practicing engineers
      • Watch recording

        DataContracts: Data Expectations Without Illusions

        How Yandex managed to bring order to the chaos of distributed data using an internal data contract service — without centralization, but with clear responsibility and transparent agreements.

        • Valeriia Terova

          Yandex

        Hall 2In RussianRUComplexity -Introduction to technology
      • Watch recording

        What Metastore Is

        What metastore is, how it works in the big data ecosystem, what solutions exist on the market and why we decided to develop our own. I will share practical experience, architecture and lessons we have learned.

        • Mikhail Ivanov

          Positive Technologies

        Hall 3In RussianRUComplexity -For practicing engineers
    • Architecture of Data Platforms

      7
      • Watch recording

        How We Built a Data Lakehouse Platform on Apache Ozone

        In this talk, I will tell you how we migrated from a platform based on Vertica, HDFS to the new Dota 2 (the second version of our internal analytics platform)) architecture based on Apache Ozone (S3), Trino, Spark and Iceberg. I will share our experience in choosing storage, explain why we abandoned HDFS and why we chose Apache Ozone as an on-prem implementation of S3.

        • Vitaliy Moiseev

          Ostrovok!

        Hall 2In RussianRUComplexity -For practicing engineers
      • Watch recording

        From a Bucket in S3 to Data Lakehouse: The Evolution of a Data Platform in the Race for Autonomy

        How Data Lakehouse became our lifeline: painless migration with a continuous flow of more than 150 TB per day.

        • Nikita Bandurko

          Navio

        • Georgy Popov

          Navio

        Hall 3In RussianRUComplexity -For practicing engineers
      • Watch recording

        How We Provide Self-Service Development and Deployment of Showcases in Avito

        The architecture of the testing and deployment service for showcases in Avito and the approaches used in testing showcases.

        • Aik Oganesian

          Avito

        • Nikolai Ogorov

          Avito

        Hall 1In RussianRUComplexity -Introduction to technology
      • Watch recording

        How To Organize a Scalable Research Cluster for More Than 600 Data Scientists Using JupyterHub in Kubernetes

        We'll talk about how Wildberries implements a JupyterHub and Kubernetes-based research platform for more than 600 data scientists who solve problems in areas such as CV, NLP, OCR, and recommendations.

        • Daniil Ponizov

          Wildberries & Russ

        • Vlad Pechen

          Wildberries & Russ

        Hall 2In RussianRUComplexity -For practicing engineers
      • Watch recording

        DataOps Under the Microscope: CRD and Kubernetes Operators for the ETL Test Tube Lifecycle

        How the T-Bank team migrated DataOps to Kubernetes and didn't go crazy. I'll tell you how we designed and implemented infrastructure for managing the lifecycle of ETL tasks using Kubernetes operators, automated DAG delivery and integrated it into the existing DataOps. I'll analyse what happened, where we made mistakes, and what you absolutely shouldn't do.

        • Sergei Boiko

          Т-Bank

        Hall 1In RussianRUComplexity -For practicing engineers
      • Watch recording

        Launching YugabyteDB in Production

        The database is already covered with read replica, but it is still not eniguh — what should you do?

        I'll tell you in detail about our experience with YugabyteDB, which we chose as the solution. We will discuss important settings, nuances from the point of view of development and bugs that we found.

        For those who will be rolling YugabyteDB into production, the talk will save a lot of time and nerves. But it will also be interesting for those who use PostgreSQL or another classic relational database and are thinking about its scalability and fault tolerance.

        • Vasilii Osadchii

          01.tech

        Hall 3In RussianRUComplexity -For practicing engineers
      • Watch recording

        Criteria for a Good Data Platform From Yandex Delivery

        How can we measure the quality of a data platform and manage its development? I'll tell you how at Yandex Delivery we built a metrics system to evaluate 7 key areas — from infrastructure stability to business data usage.

        • Vladislav Gotsuliak

          Yandex Delivery

        Hall 2In RussianRUComplexity -For practicing engineers
    • Use Cases

      4
      • Watch recording

        How Challenging Times Forced Us To Build Better BI

        How we at T-Bank built our BI tool on Apache Superset, rebuilt our BI culture, made synergies between BI analysts and developers of our BI tool and successfully migrated from Tableau.

        • Roman Nazarenko

          T-Bank

        • Ekaterina Shcherbakova

          T-Bank

        Hall 3In RussianRUComplexity -Introduction to technology
      • Watch recording

        How We Improved Data Management Processes in Airflow: Practical Cases

        I'll tell you how we use Airflow in practice: from the pain of sensors to the convenience of datasets, from standard features to our own custom solutions. The talk will not leave those who are faced with the actual operation of Airflow indifferent.

        • Dmitrii Morozov

          Innovation Center "Safe Transport"

        Hall 2In RussianRUComplexity -For practicing engineers
      • Watch recording

        Hadoop Is Not Dead — Just Secure!

        The story of how a small team of engineers implemented Hadoop with full Kerberos and Ranger-based security without stopping business processes.

        • Antony Aleksandrov

          Detsky Mir

        Hall 2In RussianRUComplexity -For practicing engineers
      • Watch recording

        How X5 Tech Provides Data Analytics Without the Involvement of Analysts, Specialists, and Other Intermediaries

        I'll tell you about an AI assistant that helps users get answers to questions about data. You'll learn how we at X5 Tech manage the quality of answers and how data and data descriptions affect the final result.

        • Vladimir Ermachenkov

          X5 Tech

        Hall 2In RussianRUComplexity -For practicing engineers
    • AI/LLM in Data

      4
      • Watch recording

        Automation of Configuration of ETL Processes Based on Apache Spark 3, Using RAG and LLM MTS

        I will tell you about a method for automated optimization of Apache Spark configuration for ETL processes using Spark metrics and the RAG system, which significantly optimizes the utilization of ETL processes.

        • Ilya Kochagin

          MWS Cloud Platform

        Hall 2In RussianRUComplexity -For practicing engineers
      • Watch recording

        AI Under Lock and Key: How We Deployed a Secure LLM Service for 3,000 Developers

        How to build a secure, powerful, and scalable LLM service for a large company: with UI, API, moderation, and model support for completely different tasks.

        • Ilia Darkovskii

          Kaspersky

        Hall 3In RussianRUComplexity -For practicing engineers
      • Watch recording

        Semantic RAG: An Analytical Approach to Knowledge Modeling for LLM

        How to build meaningful Retrieval-Augmented Generation (RAG) pipelines where LLM doesn't just “guess” the answer based on similar chunks, but consciously explores the data based on its structure and relationships.

        • Olga Tatarinova

          Epoch8

        Hall 2In RussianRUComplexity -For practicing engineers
      • Watch recording

        AI Assistants in Data Management

        The potential of using AI to automate Data Governance processes on the side of data platform users.

        • Oleg Sagitov

          T-Bank

        Hall 3In RussianRUComplexity -For practicing engineers
    • Database Internals

      3
      • Watch recording

        Codec Usage in ClickHouse: Pros and Cons

        I will reveal how codecs LZ4, ZSTD, Delta, and DoubleDelta help increase query speed and reduce storage volume. I will highlight the challenges that arise when using them in projects.

        • Anastasiia Afanaseva

          GlowByte

        Hall 2In RussianRUComplexity -For practicing engineers
      • Watch recording

        Vector Search Algorithms in Modern Databases

        A detailed review of existing vector search algorithms, the most popular in modern database management systems.

        • Alexander Zevaykin

          YDB

        Hall 3In RussianRUComplexity -Academic talk
      • Watch recording

        Vector Search Algorithms in YDB

        YDB has undergone a significant development path from applying basic vector search techniques to creating a scalable and efficient vector index. The talk presents a detailed analysis of the stages of evolution of vector search in YDB, including analysis of complexities and engineering solutions. 

        • Alexander Zevaykin

          YDB

        Hall 3In RussianRUComplexity -For practicing engineers
    • DQ

      3
      • Watch recording

        Good Data Doesn’t Happen by Accident

        Good data doesn’t happen by accident. I’ll share my experience building a tool that helps validate data automatically — fast, flexible, and pain-free.

        • Iurii Goryntsev

          Arenadata Catalog

        Hall 2In RussianRUComplexity -For practicing engineers
      • Watch recording

        How We Searched for Tools for DQ and What We Ended Up With

        Review and comparison of existing Python libraries and a self-written profiling tool for data quality analysis. Description of the tool's functionality.

        • Pavel Pavliukov

          Gazprombank.Tech

        • Alexander Svyazhin

          Gazprombank.Tech

        Hall 3In RussianRUComplexity -For practicing engineers
      • Watch recording

        Data Quality as a Service — a self-service tool in a large company

        How to implement a Data Quality distributed architecture tool that ensures smooth operation for a large number of teams and is a single point of truth about data quality in company systems.

        • Andrei Azeev

          MWS Cloud Platform

        • Bogdan Petrov

          MWS Cloud Platform

        Hall 3In RussianRUComplexity -Introduction to technology
    • MPP

      1
      • Watch recording

        DWH in StarRocks: A Year in Production

        The real experience of building DWH in StarRocks: architecture, application cases, pitfalls. Whether StarRocks met our expectations or not.

        • Artem Markin

          Peredovye Platezhnye Resheniya

        Hall 2In RussianRUComplexity -Introduction to technology
    • Art&Science

      1
      • Watch recording

        Art and Cybernetics

        How the connection between nature and man helps to solve a variety of tasks.

        • Dmitrii Bulatov

        Hall 1In RussianRU
    • Off Topic

      4
      • Watch recording

        State of Data 2025 by SmartData Program Committee

        A year ago, there was the first survey and the first results of the State of Data. This time we will not just look at the results, but also see the dynamics: what has changed over the year.

        • Oleg Kochergin

          Positive Technologies

        • Sergey Boytsov

        Hall 2In RussianRU
      • No record

        The Round Table “Hadoop Is Dead, Long Live Hadoop?!”

        10 years ago, Hadoop was synonymous with big data. There is a perception that today's cloud platforms and modern data stacks have left it behind. But is this really the case? We will discuss openly and off the record what is really happening and how to live with it.

        • Mikhail Maryufich

          T-Bank

        • Aleksei Belozerskii

          VK Tech, VK Cloud

        • Vitaliy Moiseev

          Ostrovok!

        • Igor Dmitriev

          Wildberries & Russ

        • Dmitry Zuev

          Positive Technologies

        Hall 2In RussianRUOffline activity, not broadcast or recordedOffline onlyActivity is not recordedREC
      • No record

        Lightning Talks

        Lightning talks is a great format to dynamically discuss a topic and find like-minded people. There will be 20-minute talks on professional topics and live discussions.

        • Artem Dubinin

          VK / VK Tech

        • Dmitrii Shveenkov

          VK

        • Mikhail Lukin

          Sudo

        • Bronislav Zhitnikov

          Positive Technologies

        Hall 3In RussianRUOffline activity, not broadcast or recordedOffline onlyActivity is not recordedREC
      • Watch recording

        SmartData 2025 Closing Session

        We will be summarising the results of the conference, recalling the highlights and talking about future plans. Join us in the hall or online so you don't miss a thing!

        • Mikhail Lukin

          Sudo

        • Bronislav Zhitnikov

          Positive Technologies

        Hall 1In RussianRU
    SmartData 2025

    Conference on Data Engineering

    Our conferences
    • Calendar of all conferences
    • BiasConf
    • C++ Russia
    • CargoCult
    • DevOops
    • DotNext
    • Flow
    • GoFunc
    • Heisenbug
    • HolyJS
    • Hydra
    • IML
    • InBetween
    • JPoint
    • Joker
    • Mobius
    • PiterPy
    • SafeCode
    • SmartData
    • TechTrain
    • VideoTech
    • sysconf
    Menu
    • Become a speaker
    • Schedule
    • Speakers
    • Media
    • Partners
    • About
    • Archive
    • Experts
    • Code of Conduct
    • Participation rules
    • Legal documents

    JUG Ru Group

    Need help?

    • Phone: +7 (812) 313-27-23
    • Email: support@smartdataconf.ru
    • Telegram: @JUGConfSupport_bot

    Social links

    • Youtube
    • X
    • Telegram chat
    • Telegram channel
    • VK
    • Habr
    © JUG Ru Group, 2017–2025