
Phil Laszkowicz
FuturiceHow to master time and space
Applying MLOps to a high-performance geospatial data platform for the edge and cloud.
Applying MLOps to a high-performance geospatial data platform for the edge and cloud.
Join the conference closing, where we will discuss the most interesting finds of the day, as well as what will be waiting for us tomorrow.
We'll be discussing a wide variety of languages and technologies that data engineers are currently working with.
Alexander will talk about the main characteristics of the modern data platform, the differences in the DWH architecture, the components used, and the open source distribution of Hadoop.
Maria and Olga will present a talk on how to build an analytics system, which significantly expands business opportunities, using JVM and open source technologies.
This talk is about building a fell-safe system for an Apache NiFi cluster using Apache Kafka as an input source.
The talk about S7's experience in building a data platform, how long it took to build it.
Maksim's talk is about the pros and cons of various solutions for storing data: Cloud Solutions, Bare Metal Solutions, Hadoop, Vertica, ClickHouse, ExaSol, GreenPlum (ArenaDataDB), RDBMS, Teradata, and other.
Zoom session where we will gather all the attendees, speakers, and program committee members and experts. We will sum up the highlights of the conference and chat with each other in an informal setting of a merry crowd, like in good old non-COVID times. The only difference is that it will be in Zoom, because now it’s time of COVID, unfortunately.
Join the link below the player!
Join the SmartData closing with the Program committee: we will discuss the most interesting talks and chatters as well as talks that should be returned after the conference.
Evgeny will talk about modern trends of Modern Data Stack, about pros and cons of old (ETL) and new (ELT) approaches and reasons which led to creating their own DSL.
We'll talk about the evolution of big data at Spotify, from Python, Hadoop, Hive, Storm, Scalding to today's world of cloud, and serverless computing.
Let's talk about some technologies that can help you to take more out of your machine — JIT, BLAS, and parallelism.
It is not a problem to make table migration when the database is stopped. But what if you need to migrate if the database is working? Nikolay will tell you about this in the form of practical tips for PostgreSQL.
Vladislav will talk about versioning database structure taking Lamoda storage as an example.
During this session, we'll talk about architecture, why Staroid used Kubernetes, what were the challenges, and how the company solved them. You will also see a working demo so you can get an idea of what the Serverless Spark experience looks like and how it benefits in your work.
Stanislav wants to share the example of how you can replace the centralized S3 for storing data with a more accessible solution, organize policies so that data processing becomes more efficient. And also tell why there are multigraphs, homomorphic cryptography, multi-pass games, zero-knowledge proofs, and other mathematics.
In his talk, Pavel will tell you what caused data fragmentation in his organization, and what typical analytics scenarios suffer as a result. He will also explain why the classic approach did not work for Deutsche Bank and what they learned to do differently.
Applying MLOps to a high-performance geospatial data platform for the edge and cloud.
The talk about the principles of building a new database from scratch for working with logs and telemetry.
Find out what awaits you in the next 4 days. The program committee will talk about schedule, interesting talks, and in what format they will be held. The team of organizers in turn will tell you how our platform works, where discussion zones will be held, how to connect to chat rooms, and where to ask questions.
Pasha and Vitaliy will talk about what data engineers choose and why they decided to make an API for one of the most popular frameworks for pipelines building.
This talk is a gentle introduction to the latest and greatest of Delta Lake. You will learn what Delta Lake is and what challenges it aims to solve.
Join the conference closing, where we will discuss the most interesting finds of the day, as well as what will be waiting for us tomorrow.
Join us for a presentation of a new JetBrains product: the Big Data Tools plugin. We will discuss its most significant use cases and provide a short demonstration using real-world examples. All questions will be answered by the developers directly involved in BDT development.
Join the conference closing, where we will discuss the most interesting finds of the day, as well as what will be waiting for us tomorrow.
Vladimir will talk about the motivation you need to develop your own ETL tool, about transforming ETL and DWH into DMP. The speaker will share what problems arise during the development of DMP and tell about the experience of solving them.
How to make Spark + Scala jobs and Python apps friends? Andrey will explain why it's worth doing and how to write pipelines with reusable blocks and flexible architecture using Dagster.
There is not a very high-quality DS model in production and now there is no way to retrain or update it. To avoid this, come and listen to Mikhail's talk on this topic.
How does data from wearable devices travel to the user interface of the Digital Worker system.
During this session, we will talk about the popular approach to data processing — thread processing, with a focus on working with the state.
Using the example of the history of building a repository for an advanced web analytics service, Artur will tell how the storage and reporting system in his project has evolved over the past 5 years.
The DWH structure is not very flexible and modern approaches to design help fix this: Data Vault and Anchorn modeling. Eugene and Nikolay will tell you more about what to choose.
We will talk about NiFi as ETL and data Initiation for streaming. Bronislav will try to describe some practices and advice that Tinkoff uses.
In this talk, Jeff would talk about how to use Flink on Zeppelin to build your own streaming data analytics platform.
During this session Alexander will tell what makes Kusto (Azure Data Explorer) different from other solutions, will show how complex analysis of live telemetry of billion of records can take seconds, and open the curtain of the architecture on which Kusto is built.
Join the conference closing, where we will discuss the most interesting finds of the day, as well as what will be waiting for us tomorrow.