Start of main content

2 offline days

September 13–14 10:00–19:30 (UTC+3)

Offline: Hotel MonArch, Leningrad Avenue, 31А, building 1, Moscow, Russian Federation

Online broadcast

Why It’s Worth Going

  • Talk in person

    To see old friends. To discuss current problems. To come up with new ideas. To debate and just chat.
  • Switch the format to offline

    To have a change of scenery, to distract and have a good time. To gain fresh impressions and new acquaintances.

Broadcast

There will be a broadcast on the offline part of the conference, which is available to participants with any ticket. If you want to meet and interact with the speakers and other participants live, we are waiting for you at the venue. And if you are not ready to get to the venue, recordings of all the talks and activities will be waiting for you on this website.

Program

  1. September 13

    • Watch recording

      Talk type: Talk

      I’ll Change the Way You Look at Data Storage in 30 Minutes

      In many business tasks we rely on our DWH, Data Lake, LakeHouse, etc. In the image and likeness of how OLAP spreadsheets did it years ago. But business tasks and data processes have changed a lot since then, and for some businesses this approach is fundamentally wrong, because they have a different nature of data than they had decades ago. The speaker will talk about: how data is different in today's businesses; the approach that Google proposed in its 2015 article; the problems this approach solves; the new problems it creates, and what to do about them now.

    • Watch recording

      Talk type: Talk

      How We Adapted Dynamic YTsaurus Tables to Store Blobs

      To improve the efficiency of YTsaurus, the team decided to remove blobs and store them separately from "normal" tabular data. They had to modify compaction algorithms in a special way to be able to collect "garbage" among the blocks and to provide a suitable tradoff between the disk space (space amplification) and the amount of permanently overwritten data (write amplification). They also took an approach to a number of tables, which were kept in RAM. As a result, we moved (under the guise of blobs!) some of their data to disks and reduced RAM consumption by several times, while maintaining low read times at high quantiles. In the process of implementation, the IO-stack had to be significantly improved by switching to io_uring, and the block-storage layer by adding a consistent hashing algorithm to choose the method of data replicas arrangement.

    • Watch recording

      Talk type: Talk

      How to Make Your Apache NiFi Feel Bad

      NiFi is a very powerful tool, and it can cover a very wide range of tasks. However, there are some tasks that make NiFi not feel very good. The speaker will talk about his view on such tasks.

      A talk on how not to use NiFi, what cases NiFi can implement, how to implement them, and why not to do it.

    • Watch recording

      Talk type: Partner’s talk

      Building Disaster-Resistant Data Warehouses

      With the exit of foreign vendors, building disaster-resistant data storages has become even more difficult. Surely, many people have faced this problem and understand the difficulties of implementing such storages based on Greenplum. Alexander will talk about possible solutions and the best ways to build them, and show the most successful approaches.

    • Watch recording

      Talk type: Talk

      Hadoop in the Cloud is OK

      For OK Hadoop is a key infrastructure component: it is actively used both for product analytics implementation and for recommendation systems production. In terms of volumes it is more than 200 PB in HDFS, 50k vcores, 200 TB RAM. The speaker will talk about clustering in OC, migration of clusters to internal container cloud. In the report we expect details of the final solution, overview of the migration rake and benefits of the approach.

    • Watch recording

      Talk type: Talk

      A distributed SQL query engine for data analytics

      The architecture of a distributed opensource SQL engine is described. On exectuting queries the engine loads the data in memory . The calculation is divided into stages that can be performed on different nodes. Cross-cluster queries are supported. User defined functions are supported.

    • Watch recording

      Talk type: Talk

      A Couple of Words on How We Implement Data Observability

      The speaker will talk about the perennial problem with data quality and detail why and how they built the data quality platform at SberHealth. He will reveal the work with great expectations, integration with the data catalog (DataHub) and tell what happens after they find "broken" data.

    • Watch recording

      Talk type: Talk

      Speed-up queries: How to Cook ClickHouse Well-done

      If you know the rules of working with ClickHouse, you can process hundreds of millions of data in a matter of seconds. After analyzing the experience of using it, the speaker will tell you about the most popular and effective ways to speed up queries. Indices — with duplicates will not help, but significantly reduce the amount of scanned data. Projections — what, how and why. Sharding — how to scale horizontally.

    • Watch recording

      Talk type: Talk

      Extra-Atmospheric Astronomy and the New Space Telescope "James Webb"

      Astronomers are cramped on Earth: the atmosphere is in the way, Ilon Musk's satellites are in the way, and the size of the planet is too small. Now space for astronomers has become not only an object of research, but also a working platform. What new things have scientists learned with the help of space telescopes and what are the prospects?

    • No record

      Talk type: Game

      Quiz

      Устали думать во время докладов? Тогда приглашаем подумать на творческой викторине! Совместно с Лигой Индиго проведем интеллектуальный квиз, где вы сможете испытать интеллект и эрудицию и отдохнуть в компании других участников конференции.

    • No record

      Talk type: BOF-session

      Science and Programmers in Space

      We will talk about international space cooperation and competition. We will discuss modern domestic instruments, as well as issues of import substitution in software and hardware for space observations. We will consider the possibilities of programmers to make useful changes in space science.

  2. September 14

    • Watch recording

      Talk type: Talk

      Fast data processing in Data Lake with Trino

      The speaker will cover the implementation and practical use of key optimizations that allow Trino and related commercial products to quickly "grind" data from your lake: using Parquet and ORC metadata to reduce the amount of read-out data (project/filter/aggregate pushdown), dynamic filtering (runtime filtering), late materialization of columns (late materialization), and as many as three local caches: metadata cache, data cache and intermediate query results cache.

    • Watch recording

      Talk type: Talk

      How to Process Data with Spark in the Cloud

      How can you build a data processing pipeline using cloud services (DataProc and DataSphere), set up interaction with a Spark cluster via Jupyter laptops, and why is it convenient to do it in managed services? How can you teach the system to raise the cluster for you - exactly when you need it, and save money on it? What challenges do companies face when migrating and what solutions do they find? What are the peculiarities of cloud services? What do you need to be prepared for and what improvements might be needed?

    • Watch recording

      Talk type: Talk

      How We Merged the Data of Delivery Club and Yandex Eda in Two Months

      The team had an ambitious task: to combine two full-fledged data warehouses Delivery Club and Yandex Food in just over eight weeks and, before backend integration, to provide reporting with basic business metrics and data on Delivery Club. Olga will tell how they implemented this project, how they collected the task scoop, evaluated them and adjusted in the process. She will also talk about the technical implementation of improvements on DWH: what solution architecture was invented, what stack was used and why. Of course, there were some pitfalls: we will discuss which ones were stepped on and how to avoid them.

    • Watch recording

      Talk type: Talk

      Moving Towards Universality: A Hybrid OLTP Database with OLAP Query Support

      Integration of OLTP and OLAP functions in a distributed database - overcoming traditional barriers towards a universal solution. Alexey will talk about their process of developing such a solution that combines OLTP and OLAP functionality to perform both transactional and analytical queries simultaneously - YDB. He will discuss the main architectural features of such a system, compare it with ClickHouse and other standard solutions, and share his experience in implementing and using this database in real projects.

    • Watch recording

      Talk type: Talk

      Compression, encryption and more: changing the behavior and guarantees of a distributed database

      From the talk you will learn about data compression and encryption on disk and in memory in the context of a distributed database, the advantages and disadvantages of both approaches. The speaker will also consider other options for data transformation, such as filtering, and ways to implement them in an open source product.

    • Watch recording

      Talk type: Talk

      Kafka Connect: What Is This Single Message Transform Thing of Yours?

      We will consider working with Single Message Transformations (SMT) in Kafka Connect in general, and in Debezium in particular. The speaker will tell what SMT is, how to use it in practice, will review the implementation process with code examples. He will cover the pitfalls, discuss customization and configuration, and provide examples of use cases in real-world scenarios.

    • Watch recording

      Talk type: Talk

      Data Management Platform around YTsaurus

      Vladimir will share their experience of building a data management platform around YT, tell where it is good, and where it can be supplemented with different frameworks or other analytical bases. This topic can be useful for architects and data engineers who are going to build a new DWH or revise the architecture of an existing one, and are facing the hard question of choosing technologies from the Open Source world.

    • Watch recording

      Talk type: Talk

      Spark Streaming: To Use or Not To Use?

      Apache Spark Streaming is versatile enough and has rich functionality. But there are tasks, where Spark Streaming is not the best solution, and it can become more of a burden than an effective solution. Evgeny will talk about the advantages and disadvantages of Spark Streaming: when it is worth using this particular tool, and when it is better to consider other options. He will also make a checklist for using Spark Streaming in projects.

    • Watch recording

      Talk type: Talk

      Predictive Analysis of Parasitic Load on GreenPlum Clusters

      The essence of the problem: since GreenPlum has unshared resources and operates at the speed of the slowest segment, situations may arise in which some resources are underutilized or utilized unevenly, which negatively affects the optimality of executed queries. In highly loaded industrial systems it is not possible to manually analyze the optimality of all requests. And some queries can have a negative impact on all processes on a GreenPlum cluster. The speaker will tell you how to solve these problems.

    • Watch recording

      Talk type: Talk

      Application of TLA+ for Efficient Testing of Distributed Systems

      In the talk we will study the problem of development and testing of distributed systems, consider the TLA+ specification language and its application for program verification. In addition, we will describe the method of testing distributed systems based on the actor model, which combines the advantages of both fuzzing and TLA+.

  • Discussions

    Live conversation with speakers between activities. No recording and no time limit.
  • BoF

    Informal conversations without hosts or speakers. This is where new ideas are born.
  • Round tables

    Speakers and experts discuss current industry issues.

Bonus

  • Coffee and lunch breaks

    Buffet and beverages of your choosing. If you have food restrictions, write to our support team. We’ll find a solution.
  • Networking and Afterparty

    Informal atmosphere, networking for all participants, speakers, and experts. Heart-to-heart talks and an afterparty at the end of the first offline day.

FAQ

  • Where will the offline part of the conference be held?
    Offline part will be held on September 13–14 at the following address: Hotel MonArch: Leningrad Avenue, 31А, building 1, Moscow, Russian Federation.
  • When will the program and time for the offline part of the conference be known?
    We begin publishing the program in batches on the conference website one month in advance.
  • What activities will be included on the offline part of the conference?

    There will be on the offline part:

    • talks;
    • roundtables;
    • BoF-sessions: meetings of interest without a scheduled schedule;
    • discussions with offline and online speakers who will come to the site;
    • Afterparty for participants at the end of the first offline day.
  • Will there be an online broadcast of the offline part of the conference?

    We will broadcast live most of the activities of the offline part: talks, roundtables, etc.

    Discussions and BoF-sessions will not be broadcast or recorded.

  • Offline was so long ago that I no longer remember what the procedure was for offline conferences.
    Don’t worry, before the conference we will send you a participant’s memo. It will contain all the necessary information.
  • Can I buy a ticket only for the offline part of the conference?
    To attend the offline part, you must purchase an ONLINE + OFFLINE ticket. It entitles you to attend the offline part of the conference and lifetime access to the recordings of the online part.
  • How do I get into offline part if I have a ONLINE ticket?
    If you already have a ticket for the online part of the conference, you can upgrade it to ONLINE + OFFLINE. To do so, email our support team at support@smartdataconf.ru
  • How do I get to the offline part if the company only paid for my ONLINE ticket?
    If the company that paid for your ticket is not willing to upgrade to ONLINE + OFFLINE, you can do it yourself at a discount. The discount is given for taking the survey after the online part of the conference ends.
  • Is there a limit to the number of tickets for the offline part?

    The number of tickets is limited to the capacity of the conference venue.

    So it is better to buy tickets in advance while they are available.

  • Are there any restrictions on going to an offline conference?

    There will be no COVID restrictions on site visits. You don’t need QR codes or PCR tests to enter the venue. For your safety a qualified medical worker is constantly on duty at the site.

    However, if you’re feeling unwell, it’s best to refrain from going offline. You will be able to participate in the offline part remotely or watch the performances in the recording.