Schedule

  • The time in the program is for your time zone .

  • The program hasn’t been finally approved yet, so there still might be some changes.

Download schedule
  • Data Tools

    8
    • Talk

      Spark is Done!

      Let's talk about Spark. What did it give data engineers? Why do many of us use it?

      Spark has been around for over 15 years. What problems do we face when using it? Is there anything better? Is it already possible to replace it with something?

      Why is %SQLEngineName% slowing down? How can one fix this? Benchmarks, open source, and the like.

    • Talk

      GP2S3 in a Serious Way

      We upload hundreds of terabytes from Greenplum to S3 every day. You can learn about the pitfalls we have collected and what happened in the end.

    • Talk

      Spark Connect: A New Approach to Working with Apache Spark

      I will tell you about Spark Connect — a new approach to working with Apache Spark, which allows you to develop the client part of the application in any language and not depend on the JVM. We will talk about the architecture of Spark Connect and its differences from classic Spark. You will learn about a project where we used Spark Connect API for C++.

    • Talk

      Ways To Organize CDC in PostgreSQL and Why Debezium out of the Box Will Not Solve All Problems

      Getting change events from sources is quite a common task that can be solved in different ways. One of such solutions is Debezium. But is it so simple and is it always the best solution? I will try to answer these questions and consider Debezium from the point of view of the difficulties that arise on the way of solving the task of change capture.

    • Talk

      StarRocks: the Reality of the Modern Data Platform

      The data platform in our company has existed for more than 5 years, during this time it has absorbed a lot of trendy (and not so trendy) solutions. I will tell you how we tried to choose our future among ClickHouse, Greenplum and Trino, and found StarRocks. 

    • Talk

      Third Party Runtime Engines for Apache Spark: Experience of Using

      Experience of using Comet and Gluten (Velox) execution engines – from the introduction and features of the build to the results of testing on real ETLs. I will tell you about pitfalls and non-obvious points, show the results of work and consider cases when these engines are useful and when they don't work at all.

    • Talk

      Apache Spark SQL. Extend and Manage

      How to configure and modify Apache Spark for your tasks without rewriting the framework. I will tell you about approaches to expanding the functionality of Spark SQL without interfering with the platform's source code. You will learn about creating your own data sources, developing user functions for specialized processing, and implementing optimization rules that adapt to various requests.

  • Data Management

    7
    • Talk

      DWH Monitoring: From Metadata to DataOps

      A practical case study of implementing DWH monitoring from Skyeng: from metadata architecture to automated data quality checks and transition to DataOps practices.

    • Talk

      DataRentgen: What’s Wrong With OSS Data Catalog and How To Make It Better

      Description of the path of developing an open source data lineage solution based on OpenLineage. Comparison with other open source solutions — OpenMetadata, DataHub, Marquez — and the reason we abandoned them in favor of our own development. No, this is not another custom Data Catalog :)

    • Talk

      How Yandex Market Storage Started Writing Documentation for Objects

      How Yandex Market started writing documentation. You will learn how it happened and what problems the company faced. We will consider different approaches to describing metadata in storages, compare them with each other and understand whether it is worth going down this path.

    • Talk

      Good Data Doesn’t Happen by Accident

      Good data doesn’t happen by accident. I’ll share my experience building a tool that helps validate data automatically — fast, flexible, and pain-free.

    • Talk

      DataContracts: Data Expectations Without Illusions

      How Yandex managed to bring order to the chaos of distributed data using an internal data contract service — without centralization, but with clear responsibility and transparent agreements.

    • Talk

      What Metastore Is

      What metastore is, how it works in the big data ecosystem, what solutions exist on the market and why we decided to develop our own. I will share practical experience, architecture and lessons we have learned.

  • Architecture of Data Platforms

    7
  • Use Cases

    4
  • AI/LLM in Data

    4
  • Database Internals

    3
    • Talk

      Codec Usage in ClickHouse: Pros and Cons

      I will reveal how codecs LZ4, ZSTD, Delta, and DoubleDelta help increase query speed and reduce storage volume. I will highlight the challenges that arise when using them in projects.

    • Talk

      Vector Search Algorithms in YDB

      YDB has undergone a significant development path from applying basic vector search techniques to creating a scalable and efficient vector index. The talk presents a detailed analysis of the stages of evolution of vector search in YDB, including analysis of complexities and engineering solutions. 

  • DQ

    2
  • MPP

    1
    • Talk

      DWH in StarRocks: A Year in Production

      The real experience of building DWH in StarRocks: architecture, application cases, pitfalls. Whether StarRocks met our expectations or not.

  • Off Topic

    3
    • Talk

      Lightning Talks

      Lightning talks is a great format to dynamically discuss a topic and find like-minded people. There will be 20-minute talks on professional topics and live discussions.

    • Conversation

      SmartData 2025 Closing Session

      We will be summarising the results of the conference, recalling the highlights and talking about future plans. Join us in the hall or online so you don't miss a thing!

We will add more talks soon.

We are actively adding to the program. Sign up for our newsletter to stay informed.

Subscribe