Start of main content

Offline day

 10:0019:45 (UTC+3)

Offline: Park Inn by Radisson Pulkovskaya, 1 Pobedy Square, Saint‑Petersburg, Russian Federation

Online broadcast

Why It’s Worth Going

  • Talk in person

    To see old friends. To discuss current problems. To come up with new ideas. To debate and just chat.
  • Switch the format to offline

    To have a change of scenery, to distract and have a good time. To gain fresh impressions and new acquaintances.

Broadcast

There will be a broadcast on the Offline day of the conference, which is available to participants with any ticket. If you want to meet and interact with the speakers and other participants live,we are waiting for you at the venue. And if you are not ready to get to the venue, recordings of all the talks and activities will be waiting for you on this website.

Program

  • The time in the program is for the time zone UTC+3.

  • As offline day comes closer, the program may be updated.

  • Talk

    Room 1

    In-person Opening SmartData 2022

    We will talk about the schedule, sessions, and share the information. Come to the room or join the broadcast to find out what to expect soon!

  • Talk

    Room 1

    100 billion messages in Kafka: load and forget

    Apache Kafka is a great tool for reliably passing messages between services, but offloading its content for offline analytics has proven to be no easy task. Especially when we're talking about hundreds of billions of messages a day, every day. Apache Spark comes to the rescue, but unfortunately, its capabilities aren't enough to work reliably and fully automated on really big data volumes. The speaker will talk about how to offload from Apache Kafka to HDFS 100 billion messages a day and stop thinking about it.

    The talk will be of interest to developers in Big Data who use Kafka to transfer large amounts of data to Hadoop.

  • Talk

    Room 1

    Path to the data model for the daily update of the past 100 days

    A story about how we chose a data model for a storage in which we have to update the last 100 days of data every day. We will look at point-to-point block replacements, the single-key table approach, Data Vault, and a couple of other approaches and choose a winner from them for our task.

  • Talk

    Room 2

    Ingest layer of the data platform: mix but do not shake

    A story about how the speaker's team built an Ingest layer for internal and external sources within the SberHealth data platform and did not forget about working with sensitive data and the data directory. Since the platform has to abstract the components underneath, we'll talk about the DSL with which to manage it all.


  • Talk

    Room 1

    Automated Spark application tuning

    Valeria will talk about the Hadoop cluster, where hundreds of daily and thousands of hourly Spark calculations run. All the calculations are very different and have their own SLA. In this situation, it's unrealistic to tune in-house with engineers. That's why they built and implemented a fully automatic tuning system based on the logs Spark writes itself. Valeria will show you how to easily extract a lot of information from these logs in the offline mode and what to look for when automatically tuning spark.executor.memory. She will also explain in detail how their tuning system is set up and what allows them to constantly adjust for changes. The talk will be of interest to those who already deal with Spark and have an idea of its structure.

  • Talk

    Room 2

    NiFi scripts as an element of Less Code ETL

    There are many transformations in NiFi that do not require coding. But not everything can be done with boxed transformations. Developing a processor for each unique transformation is an interesting but expensive option. In NiFi, you can use scripting and get a more flexible data transformation tool. Bronislav will tell you when to choose scripting and how to do it most effectively. This talk is for active NiFi users, as well as for those who are considering NiFi as an ETL tool for their tasks.

  • Talk

    Room 1

    Reliable and scalable data pipelines at OK

    Odnoklassniki has multiple recommendation systems that handle real-time requests from millions of users every day. To maintain the quality of these systems, there are hundreds of pipelines running daily that collect datasets and attributes, train models and roll them out to extensions, run models in batch mode, load attributes into the Feature Store, and much more. But what happens if some of the pipelines stop working?

  • Talk

    Room 2

    Using the GrowthBook platform to manage ML experiments

    In this talk, we want to talk about the way to organise experiment pipeline, where the responsibility for launching and testing features lies within the ML development team, based on the open-source GrowthBook platform. The proposed approach is intended to reduce the number of integrations on the side of the core development team, while increasing the speed of bringing new versions of machine learning models into production.

  • Talk

    Room 1

    What is DevOps in the world of data warehousing?

    Petabytes of data go through PochtaTech's services. Dozens of teams and departments work with it, using a bunch of frameworks and technologies. Most of this data is stored and developed in DataCloud. Vasily will talk about how DevOps practices are used in working with data warehouses and how this can reduce time-to-market.

  • Talk

    Room 2

    How to load everything in the data catalog and not to die

    It is not enough to create a convenient data catalog, the biggest job is to fill it with metadata taken from a huge number of different sources.

    In her talk, Ivan will tell why they had to switch from a pull approach to a push approach, about the peculiarities of technical implementation and the problems they encountered.

    The talk will be useful for those who have already implemented or are thinking of implementing or developing a data catalog.

  • Talk

    Room 1

    Recovering a distributed database after a crash

    Imagine you were editing a document, but deleted it by mistake. Rolling back to Report3_release2FinalLast-Fixed!!!4.txt.bak.bak, saved on a flash drive, and a couple of memory additions would fix the problem.

    Now imagine that several people were editing a document online and the server went down. A server backup and coordinated work by the authors of the document would solve the problem.

    Finally, imagine that thousands of people edited millions of documents on hundreds of servers with asynchronous replication to a backup cluster, but a bug in the code caused every million changes within each cluster to be lost. Is there a solution to such a problem?

    The speaker will tell you what to do when code-review, failover, and certification did not help avoid a distributed database crash.

  • Talk

    Room 1

    SmartData 2022 Conference Closing

    We take stock, remember the bright moments and talk about our plans. Come to the room or join the broadcast, so you don't miss anything!

  • Discussions

    Live conversation with speakers between activities. No recording and no time limit.
  • BoF

    Informal conversations without hosts or speakers. This is where new ideas are born.
  • Round tables

    Speakers and experts discuss current industry issues.

Bonus

  • Coffee and lunch breaks

    Buffet and beverages of your choosing. If you have food restrictions, write to our support team. We’ll find a solution.
  • Networking

    Informal atmosphere and heart-to-heart talks. Networking for all participants, speakers, and experts.
Buy a ticket

COVID-19 free zone

At the entrance, we will ask you to show a QR code confirming vaccination (with a Russian or foreign vaccine) or a negative PCR test done no earlier than 48 hours before the event. You can also show a QR-code about a previous illness.

If you have problems getting to the site, email our support — we will help you.

  • How can I access the conference?

    Presenting with your choice of:

    • Negative PCR test (it is valid for 48 hours).
    • QR-code confirming vaccination (Russian or foreign vaccine).
    • QR-code about a previous illness.
  • What if I have bought an offline ticket? Will I be refunded?

    You will be refunded if none of these options suits you:

    • You can join the broadcast and watch everything online. To refund the difference between the online and offline ticket, email our support team: support@smartdataconf.ru.
    • You can take an express test on the day of the conference. On this issue, you can also email our support team: support@smartdataconf.ru.
  • What security measures will be on site?
    • There will be sanitizers and masks. However, it is not obligatory to wear a mask; it is up to you.
    • An ambulance team is constantly on duty at the site.

    Please send all questions and clarifications to support@smartdataconf.ru.

FAQ

  • Where will the Offline day of the conference be held?
    Offline-day will be held on October 29 at the following address: Park Inn by Radisson Pulkovskaya: 1 Pobedy Square, Saint‑Petersburg, Russian Federation.
  • When will the program and time for the Offline day of the conference be known?
    We will publish the program on the conference website starting in the second half of September.
  • What activities will be included on the Offline day of the conference?

    There will be on the offline day:

    • talk
    • roundtables
    • BoF-sessions: meetings of interest without a scheduled schedule
    • discussions with offline and online speakers who will come to the site
  • Will there be an online broadcast of the Offline day of the conference?

    We will broadcast live most of the activities of the offline day: talks, roundtables, etc.

    Discussions and BoF-sessions will not be broadcast or recorded.

  • Offline was so long ago that I no longer remember what the procedure was for offline conferences.
    Don’t worry, before the conference we will send you a participant’s memo. It will contain all the necessary information.
  • Can I buy a ticket only for the Offline day of the conference?
    To attend the Offline Day, you must purchase an "Online+Offline" ticket. It entitles you to attend the offline day of the conference and access to the recordings of the online day.
  • How do I get into Offline Day if I have a "Double Online" ticket?
    If you already have a ticket for the online part of the conference, you can upgrade it to "Online+Offline". To do so, email our support team at support@smartdataconf.ru
  • How do I get to the Offline day if the company only paid for my "Double Online" ticket?
    If the company that paid for your ticket is not willing to upgrade to Offline, you can do it yourself at a discount. The discount is given for taking the survey after the online part of the conference ends.
  • Is there a limit to the number of tickets for the Offline day?

    The number of tickets is limited to the capacity of the conference venue.

    So it is better to buy tickets in advance while they are available.

  • Are there any restrictions on going to an offline conference?

    At the entrance, we will ask you to show a QR code confirming vaccination (with a Russian or foreign vaccine) or a negative PCR test done no earlier than 48 hours before the event. You can also show a QR-code about a previous illness.

    If you have problems getting to the site, email our support — we will help you.

  • What will be the case with Offline Day if a new wave of COVID-19?

    The incidence of COVID-19 is increasing, so we are setting new rules for visiting offline sites.

    At the entrance, we will ask you to show a QR code confirming vaccination (with a Russian or foreign vaccine) or a negative PCR test done no earlier than 48 hours before the event. You can also show a QR-code about a previous illness.

    At venues in Moscow and St. Petersburg, we will provide seating according to the principles of social distancing. These are tough measures, but they will make conference participants safe.

    If the situation worsens and offline events are canceled, we will move the Offline Day to online. That way, speakers will give their talks remotely or from our studio. Nothing will change for participants with a "Double Online" ticket. Online + Offline participants can convert their ticket into a "Double Online" ticket with a refund of the price difference, or carry it over to the next year. It will also be possible to return the ticket and get a full refund.

    Either way, we will not be postponing the conferences to next year.