Start of main content

Offline day

 10:0022:30 (UTC+3)

Offline: Park Inn by Radisson Pulkovskaya, 1 Pobedy Square, Saint Petersburg, Russian Federation

Online broadcast

Why It’s Worth Going

  • Talk in person

    To see old friends. To discuss current problems. To come up with new ideas. To debate and just chat.
  • Switch the format to offline

    To have a change of scenery, to distract and have a good time. To gain fresh impressions and new acquaintances.

Broadcast

There will be a broadcast on the Offline day of the conference, which is available to participants with any ticket. If you want to meet and interact with the speakers and other participants live,we are waiting for you at the venue. And if you are not ready to get to the venue, recordings of all the talks and activities will be waiting for you on this website.

Program

  • Watch recording

    Talk type: Conversation

    In-person Opening SmartData 2022

    We will talk about the schedule, sessions, and share the information. Come to the room or join the broadcast to find out what to expect soon!

  • Watch recording

    Talk type: Talk

    100 billion messages in Kafka: load and forget

    Apache Kafka is a great tool for reliably passing messages between services, but offloading its content for offline analytics has proven to be no easy task. Especially when we're talking about hundreds of billions of messages a day, every day. Apache Spark comes to the rescue, but unfortunately, its capabilities aren't enough to work reliably and fully automated on really big data volumes. The speaker will talk about how to offload from Apache Kafka to HDFS 100 billion messages a day and stop thinking about it.

    The talk will be of interest to developers in Big Data who use Kafka to transfer large amounts of data to Hadoop.

  • Watch recording

    Talk type: Talk

    Path to the data model for the daily update of the past 100 days

    A story about how we chose a data model for a storage in which we have to update the last 100 days of data every day. We will look at point-to-point block replacements, the single-key table approach, Data Vault, and a couple of other approaches and choose a winner from them for our task.

  • Watch recording

    Talk type: Talk

    Ingest layer of the data platform: mix but do not shake

    A story about how the speaker's team built an Ingest layer for internal and external sources within the SberHealth data platform and did not forget about working with sensitive data and the data directory. Since the platform has to abstract the components underneath, we'll talk about the DSL with which to manage it all.


  • Watch recording

    Talk type: Talk

    Automated Spark application tuning

    Valeria will talk about the Hadoop cluster, where hundreds of daily and thousands of hourly Spark calculations run. All the calculations are very different and have their own SLA. In this situation, it's unrealistic to tune in-house with engineers. That's why they built and implemented a fully automatic tuning system based on the logs Spark writes itself. Valeria will show you how to easily extract a lot of information from these logs in the offline mode and what to look for when automatically tuning spark.executor.memory. She will also explain in detail how their tuning system is set up and what allows them to constantly adjust for changes. The talk will be of interest to those who already deal with Spark and have an idea of its structure.

  • Watch recording

    Talk type: Talk

    NiFi scripts as an element of Less Code ETL

    There are many transformations in NiFi that do not require coding. But not everything can be done with boxed transformations. Developing a processor for each unique transformation is an interesting but expensive option. In NiFi, you can use scripting and get a more flexible data transformation tool. Bronislav will tell you when to choose scripting and how to do it most effectively. This talk is for active NiFi users, as well as for those who are considering NiFi as an ETL tool for their tasks.

  • Watch recording

    Talk type: Talk

    Using the GrowthBook platform to manage ML experiments

    In this talk, we want to talk about the way to organise experiment pipeline, where the responsibility for launching and testing features lies within the ML development team, based on the open-source GrowthBook platform. The proposed approach is intended to reduce the number of integrations on the side of the core development team, while increasing the speed of bringing new versions of machine learning models into production.

  • Watch recording

    Talk type: Talk

    What is DevOps in the world of data warehousing?

    Petabytes of data go through PochtaTech's services. Dozens of teams and departments work with it, using a bunch of frameworks and technologies. Most of this data is stored and developed in DataCloud. Vasily will talk about how DevOps practices are used in working with data warehouses and how this can reduce time-to-market.

  • Watch recording

    Talk type: Talk

    How to load everything in the data catalog and not to die

    It is not enough to create a convenient data catalog, the biggest job is to fill it with metadata taken from a huge number of different sources.

    In her talk, Ivan will tell why they had to switch from a pull approach to a push approach, about the peculiarities of technical implementation and the problems they encountered.

    The talk will be useful for those who have already implemented or are thinking of implementing or developing a data catalog.

  • Watch recording

    Talk type: Talk

    Recovering a distributed database after a crash

    Imagine you were editing a document, but deleted it by mistake. Rolling back to Report3_release2FinalLast-Fixed!!!4.txt.bak.bak, saved on a flash drive, and a couple of memory additions would fix the problem.

    Now imagine that several people were editing a document online and the server went down. A server backup and coordinated work by the authors of the document would solve the problem.

    Finally, imagine that thousands of people edited millions of documents on hundreds of servers with asynchronous replication to a backup cluster, but a bug in the code caused every million changes within each cluster to be lost. Is there a solution to such a problem?

    The speaker will tell you what to do when code-review, failover, and certification did not help avoid a distributed database crash.

  • Discussions

    Live conversation with speakers between activities. No recording and no time limit.
  • BoF

    Informal conversations without hosts or speakers. This is where new ideas are born.
  • Round tables

    Speakers and experts discuss current industry issues.

Bonus

  • Coffee and lunch breaks

    Buffet and beverages of your choosing. If you have food restrictions, write to our support team. We’ll find a solution.
  • Networking

    Informal atmosphere and heart-to-heart talks. Networking for all participants, speakers, and experts.

COVID-19

We have eliminated the COVID restrictions on site visits. Now you don’t need QR codes or PCR tests to enter the venue.

However, if you’re feeling unwell, it’s best to refrain from going offline. It’s important to take care of yourself and those around you.

You will be able to watch the conference broadcast online, and you can get a refund of the difference in ticket price or exchange your ticket for the next season’s ticket. If you can’t attend the venue, email our support team we’ll help you.

  • How can I access the conference?

    Only a ticket is required to attend the conference. QR codes and PCR tests are not required to enter the venue. However, if you feel unwell, it is better to refrain from going offline. It is important to take care of both yourself and those around you.

  • What if I have bought an offline ticket? Will I be refunded?

    If you get sick and can’t attend the venue, you’ll get your money back if the option of watching the online broadcast doesn’t work for you.

    You can connect to the broadcast and watch everything online. To refund the difference between "Double Online" and Online+Offline tickets, email our support team: support@smartdataconf.ru.

  • What security measures will be on site?
    • There will be sanitizers and masks. However, it is not obligatory to wear a mask; it is up to you.
    • An ambulance team is constantly on duty at the site.

    Please send all questions and clarifications to support@smartdataconf.ru.

FAQ

  • Where will the Offline day of the conference be held?
    Offline-day will be held on October 29 at the following address: Park Inn by Radisson Pulkovskaya: 1 Pobedy Square, Saint Petersburg, Russian Federation.
  • When will the program and time for the Offline day of the conference be known?
    We will publish the program on the conference website starting in the second half of September.
  • What activities will be included on the Offline day of the conference?

    There will be on the offline day:

    • talk
    • roundtables
    • BoF-sessions: meetings of interest without a scheduled schedule
    • discussions with offline and online speakers who will come to the site
  • Will there be an online broadcast of the Offline day of the conference?

    We will broadcast live most of the activities of the offline day: talks, roundtables, etc.

    Discussions and BoF-sessions will not be broadcast or recorded.

  • Offline was so long ago that I no longer remember what the procedure was for offline conferences.
    Don’t worry, before the conference we will send you a participant’s memo. It will contain all the necessary information.
  • Can I buy a ticket only for the Offline day of the conference?
    To attend the Offline Day, you must purchase an "Online+Offline" ticket. It entitles you to attend the offline day of the conference and access to the recordings of the online day.
  • How do I get into Offline Day if I have a "Double Online" ticket?
    If you already have a ticket for the online part of the conference, you can upgrade it to "Online+Offline". To do so, email our support team at support@smartdataconf.ru
  • How do I get to the Offline day if the company only paid for my "Double Online" ticket?
    If the company that paid for your ticket is not willing to upgrade to Offline, you can do it yourself at a discount. The discount is given for taking the survey after the online part of the conference ends.
  • Is there a limit to the number of tickets for the Offline day?

    The number of tickets is limited to the capacity of the conference venue.

    So it is better to buy tickets in advance while they are available.

  • Are there any restrictions on going to an offline conference?

    We have eliminated the COVID restrictions on site visits. Now you don’t need QR codes or PCR tests to enter the venue.

    However, if you’re feeling unwell, it’s best to refrain from going offline. It’s important to take care of yourself and those around you.

  • What will be the case with Offline Day if a new wave of COVID-19?

    So far we haven’t seen an increase in COVID-19 incidence, so we’re cancelling the covid restrictions on offline site visits. Now you don’t need QR codes or PCR tests to get to the site. There will be sanitizers and disposable masks on site. If there is no requirement by the time of the conference, there will be no mandatory masking.

    If you are feeling unwell, it is best to refrain from attending offline. It is important to take care of both yourself and those around you.

    If the situation worsens and offline events are canceled, we will move the Offline Day to online. That way, speakers will give their talks remotely or from our studio. Nothing will change for participants with a "Double Online" ticket. Online + Offline participants can convert their ticket into a "Double Online" ticket with a refund of the price difference, or carry it over to the next year. It will also be possible to return the ticket and get a full refund.

    Either way, we will not be postponing the conferences to next year.