Start of main content
Offline day
10:00–22:30 (UTC+3)
Offline: Park Inn by Radisson Pulkovskaya, 1 Pobedy Square, Saint Petersburg, Russian Federation
Online broadcast
Why It’s Worth Going
- To see old friends. To discuss current problems. To come up with new ideas. To debate and just chat.
Switch the format to offline
To have a change of scenery, to distract and have a good time. To gain fresh impressions and new acquaintances.
Broadcast
There will be a broadcast on the Offline day of the conference, which is available to participants with any ticket. If you want to meet and interact with the speakers and other participants live,we are waiting for you at the venue. And if you are not ready to get to the venue, recordings of all the talks and activities will be waiting for you on this website.
See for yourself
Program
Watch recording Talk type: Conversation
In-person Opening SmartData 2022
We will talk about the schedule, sessions, and share the information. Come to the room or join the broadcast to find out what to expect soon!
Nikolay Markov
Company: Aligned Research Group
Alexey Fyodorov
Company: JUG Ru Group
Watch recording Talk type: Talk
100 billion messages in Kafka: load and forget
Apache Kafka is a great tool for reliably passing messages between services, but offloading its content for offline analytics has proven to be no easy task. Especially when we're talking about hundreds of billions of messages a day, every day. Apache Spark comes to the rescue, but unfortunately, its capabilities aren't enough to work reliably and fully automated on really big data volumes. The speaker will talk about how to offload from Apache Kafka to HDFS 100 billion messages a day and stop thinking about it.
The talk will be of interest to developers in Big Data who use Kafka to transfer large amounts of data to Hadoop.
Denis Efarov
Company: Odnoklassniki
Watch recording Talk type: Talk
Love and hate Prefect 2.0 after Apache Airflow
The speaker will review Prefect 2.0 and its basic concepts. She will compare it to Apache Airflow, praise and scold it. You will learn for which cases this tool is best suited.
Iuliia Volkova
Company: Independent consultant
Watch recording Talk type: Interview
Interview with Andrei Kuznetsov and Mikhail Maryufich, Odnoklassniki
Let's talk with Andrei and Michael about data engineering in Odnoklassniki and discuss other topics. Join in!
Mikhail Maryufich
Company: Odnoklassniki
Andrey Kuznetsov
Company: Odnoklassniki
Alexey Fyodorov
Company: JUG Ru Group
Watch recording Talk type: Talk
Path to the data model for the daily update of the past 100 days
A story about how we chose a data model for a storage in which we have to update the last 100 days of data every day. We will look at point-to-point block replacements, the single-key table approach, Data Vault, and a couple of other approaches and choose a winner from them for our task.
Maksim Statsenko
Company: Yandex
Tatyana Kolmakova
Company: Yandex
Watch recording Talk type: Talk
Ingest layer of the data platform: mix but do not shake
A story about how the speaker's team built an Ingest layer for internal and external sources within the SberHealth data platform and did not forget about working with sensitive data and the data directory. Since the platform has to abstract the components underneath, we'll talk about the DSL with which to manage it all.
Oleg Kochergin
Company: SberHealth
Watch recording Talk type: Interview
Interview with Vasiliy Kutsenko, Pochtatech
We'll discuss with Vasily the connection between DevOps and data warehousing and other topics. Join us!
Vasily Kutsenko
Company: Pochta.Tech
Nikolay Markov
Company: Aligned Research Group
Watch recording Talk type: Talk
How SQL queries are executed in Presto/Trino
Presto/Trino is a distributed serverless SQL engine for big data. In this talk, we discuss query execution in Presto/Trino.
Vladimir Ozerov
Company: Querify Labs
Watch recording Talk type: Talk
Distributed high-loaded feature store OK
The speaker will talk about why his team wrote their own feature store OK, how it's arranged and how it's operated.
Andrey Kuznetsov
Company: Оdnoklassniki
Watch recording Talk type: Interview
Interview with Maxim Statsenko and Tatiana Kolmakova, Yandex
Let's discuss data engineering in Yandex and other topics with Maxim and Tatiana. Join us!
Maksim Statsenko
Company: Yandex
Tatyana Kolmakova
Company: Yandex
Nikolay Markov
Company: Aligned Research Group
Watch recording Talk type: Talk
Automated Spark application tuning
Valeria will talk about the Hadoop cluster, where hundreds of daily and thousands of hourly Spark calculations run. All the calculations are very different and have their own SLA. In this situation, it's unrealistic to tune in-house with engineers. That's why they built and implemented a fully automatic tuning system based on the logs Spark writes itself. Valeria will show you how to easily extract a lot of information from these logs in the offline mode and what to look for when automatically tuning spark.executor.memory. She will also explain in detail how their tuning system is set up and what allows them to constantly adjust for changes. The talk will be of interest to those who already deal with Spark and have an idea of its structure.
Valeriia Dymbitskaia
Company: oneFactor
Watch recording Talk type: Talk
NiFi scripts as an element of Less Code ETL
There are many transformations in NiFi that do not require coding. But not everything can be done with boxed transformations. Developing a processor for each unique transformation is an interesting but expensive option. In NiFi, you can use scripting and get a more flexible data transformation tool. Bronislav will tell you when to choose scripting and how to do it most effectively. This talk is for active NiFi users, as well as for those who are considering NiFi as an ETL tool for their tasks.
Bronislav Zhitnikov
Company: Tinkoff
Watch recording Talk type: Interview
Interview with Denis Efarov and Sergey Mikhalev, Odnoklassniki
Let's talk with Andrei and Michael about data engineering in Odnoklassniki and discuss other topics. Join in!
Denis Efarov
Company: Odnoklassniki
Sergey Mikhalev
Company: Odnoklassniki
Alexey Fyodorov
Company: JUG Ru Group
Watch recording Talk type: Talk
Reliable and scalable data pipelines at OK
The speaker will talk about what systems for managing pipelines were written at Odnoklassniki, and how (and why) they replaced them with an Airflow cluster, resistant to data center failure.
Mikhail Maryufich
Company: Odnoklassniki
Watch recording Talk type: Talk
Using the GrowthBook platform to manage ML experiments
In this talk, we want to talk about the way to organise experiment pipeline, where the responsibility for launching and testing features lies within the ML development team, based on the open-source GrowthBook platform. The proposed approach is intended to reduce the number of integrations on the side of the core development team, while increasing the speed of bringing new versions of machine learning models into production.
Valentin Panovskiy
Company: more.tv
Watch recording Talk type: Interview
Interview with Bronislav Zhitnikov, Tinkoff
Let's discuss data engineering at Tinkoff and other topics with Bronislav. Join us!
Bronislav Zhitnikov
Company: Tinkoff
Maksim Statsenko
Company: Yandex
Watch recording Talk type: Talk
What is DevOps in the world of data warehousing?
Petabytes of data go through PochtaTech's services. Dozens of teams and departments work with it, using a bunch of frameworks and technologies. Most of this data is stored and developed in DataCloud. Vasily will talk about how DevOps practices are used in working with data warehouses and how this can reduce time-to-market.
Vasily Kutsenko
Company: Pochta.Tech
Watch recording Talk type: Talk
How to load everything in the data catalog and not to die
It is not enough to create a convenient data catalog, the biggest job is to fill it with metadata taken from a huge number of different sources.
In her talk, Ivan will tell why they had to switch from a pull approach to a push approach, about the peculiarities of technical implementation and the problems they encountered.
The talk will be useful for those who have already implemented or are thinking of implementing or developing a data catalog.
Ivan Kanashov
Company: Tinkoff
Watch recording Talk type: Interview
Interview with Valentin Panovsky, more.tv
Let's discuss data engineering in more.tv and other topics with Valentin. Join us!
Valentin Panovskiy
Company: more.tv
Maksim Statsenko
Company: Yandex
Watch recording Talk type: Talk
Recovering a distributed database after a crash
Imagine you were editing a document, but deleted it by mistake. Rolling back to Report3_release2FinalLast-Fixed!!!4.txt.bak.bak, saved on a flash drive, and a couple of memory additions would fix the problem.
Now imagine that several people were editing a document online and the server went down. A server backup and coordinated work by the authors of the document would solve the problem.
Finally, imagine that thousands of people edited millions of documents on hundreds of servers with asynchronous replication to a backup cluster, but a bug in the code caused every million changes within each cluster to be lost. Is there a solution to such a problem?
The speaker will tell you what to do when code-review, failover, and certification did not help avoid a distributed database crash.
Anton Vinogradov
Company: Apache Software Foundation
Watch recording Talk type: Conversation
SmartData 2022 Conference Closing
We take stock, remember the bright moments and talk about our plans. Come to the room or join the broadcast, so you don't miss anything!
Maksim Statsenko
Company: Yandex
Alexey Fyodorov
Company: JUG Ru Group
Discussions
Live conversation with speakers between activities. No recording and no time limit.BoF
Informal conversations without hosts or speakers. This is where new ideas are born.Round tables
Speakers and experts discuss current industry issues.
Bonus
Coffee and lunch breaks
Buffet and beverages of your choosing. If you have food restrictions, write to our support team. We’ll find a solution.Networking
Informal atmosphere and heart-to-heart talks. Networking for all participants, speakers, and experts.
COVID-19
We have eliminated the COVID restrictions on site visits. Now you don’t need QR codes or PCR tests to enter the venue.
However, if you’re feeling unwell, it’s best to refrain from going offline. It’s important to take care of yourself and those around you.
You will be able to watch the conference broadcast online, and you can get a refund of the difference in ticket price or exchange your ticket for the next season’s ticket. If you can’t attend the venue, email our support team we’ll help you.
How can I access the conference?
Only a ticket is required to attend the conference. QR codes and PCR tests are not required to enter the venue. However, if you feel unwell, it is better to refrain from going offline. It is important to take care of both yourself and those around you.
What if I have bought an offline ticket? Will I be refunded?
If you get sick and can’t attend the venue, you’ll get your money back if the option of watching the online broadcast doesn’t work for you.
You can connect to the broadcast and watch everything online. To refund the difference between "Double Online" and Online+Offline tickets, email our support team: support@smartdataconf.ru.
What security measures will be on site?
- There will be sanitizers and masks. However, it is not obligatory to wear a mask; it is up to you.
- An ambulance team is constantly on duty at the site.
Please send all questions and clarifications to support@smartdataconf.ru.
FAQ
Where will the Offline day of the conference be held?
Offline-day will be held on October 29 at the following address: Park Inn by Radisson Pulkovskaya: 1 Pobedy Square, Saint Petersburg, Russian Federation.When will the program and time for the Offline day of the conference be known?
We will publish the program on the conference website starting in the second half of September.What activities will be included on the Offline day of the conference?
There will be on the offline day:
- talk
- roundtables
- BoF-sessions: meetings of interest without a scheduled schedule
- discussions with offline and online speakers who will come to the site
Will there be an online broadcast of the Offline day of the conference?
We will broadcast live most of the activities of the offline day: talks, roundtables, etc.
Discussions and BoF-sessions will not be broadcast or recorded.
Offline was so long ago that I no longer remember what the procedure was for offline conferences.
Don’t worry, before the conference we will send you a participant’s memo. It will contain all the necessary information.Can I buy a ticket only for the Offline day of the conference?
To attend the Offline Day, you must purchase an "Online+Offline" ticket. It entitles you to attend the offline day of the conference and access to the recordings of the online day.How do I get into Offline Day if I have a "Double Online" ticket?
If you already have a ticket for the online part of the conference, you can upgrade it to "Online+Offline". To do so, email our support team at support@smartdataconf.ruHow do I get to the Offline day if the company only paid for my "Double Online" ticket?
If the company that paid for your ticket is not willing to upgrade to Offline, you can do it yourself at a discount. The discount is given for taking the survey after the online part of the conference ends.Is there a limit to the number of tickets for the Offline day?
The number of tickets is limited to the capacity of the conference venue.
So it is better to buy tickets in advance while they are available.
Are there any restrictions on going to an offline conference?
We have eliminated the COVID restrictions on site visits. Now you don’t need QR codes or PCR tests to enter the venue.
However, if you’re feeling unwell, it’s best to refrain from going offline. It’s important to take care of yourself and those around you.
What will be the case with Offline Day if a new wave of COVID-19?
So far we haven’t seen an increase in COVID-19 incidence, so we’re cancelling the covid restrictions on offline site visits. Now you don’t need QR codes or PCR tests to get to the site. There will be sanitizers and disposable masks on site. If there is no requirement by the time of the conference, there will be no mandatory masking.
If you are feeling unwell, it is best to refrain from attending offline. It is important to take care of both yourself and those around you.
If the situation worsens and offline events are canceled, we will move the Offline Day to online. That way, speakers will give their talks remotely or from our studio. Nothing will change for participants with a "Double Online" ticket. Online + Offline participants can convert their ticket into a "Double Online" ticket with a refund of the price difference, or carry it over to the next year. It will also be possible to return the ticket and get a full refund.
Either way, we will not be postponing the conferences to next year.