Schedule

The program hasn't been finally approved yet, so there still might be some changes.

Click on the title to read the talk description. All talks with their short descriptions are here.

Scroll to top

Day 1. October 11

Time UTC+03:00  & Track
Lecture
Track 1
Track 2
Track 3

No talks in Favorites yet

16:45 - 17:00

Conference opening

17:00 - 18:00
17:00 Track 1
17:00 Track 2
DWH as a product
Evgeny Nikolaev
Avito
#dataasaproduct  #process 
17:00 Track 3
Greenplum and Anchor modeling: How dreams shutter against reality
Evgeny Ermakov
Yandex Go
Nikolay Grebenshchikov
Yandex Go
#dwh  #anchor  #DataModeling  #datavault  #architecture 
18:00 - 18:30

Break

18:30 - 19:30
18:30 Track 1
Delta Lake data layout optimization
Sabir Akhadov
Databricks Inc
#storageoptimization  #storage 
18:30 Track 2
18:30 Track 3
Hadoop 3: Erasure coding catastrophe
Denis Efarov
Mail.ru Group
#Storage  #hadoop 
19:30 - 20:00

Break

20:00 - 21:00
20:00 Track 1
Lessons learned from using machine learning to optimize database configurations
Andy Pavlo
Carnegie Mellon University
#perfomance  #datastorage  #databaseoptimization  #tuning 
20:00 Track 2
"Functional" Spark
Dmitry Zuev
Ozon
#developer  #scala 
20:00 Track 3
Trino (Presto) DB: Zero copy lakehouse
Artem Aliev
Huawei
#queryoptimization  #datavirtualisation  #queryengine  #tooling 

Day 2. October 12

Time UTC+03:00  & Track
Lecture
Track 1
Track 2
Track 3

No talks in Favorites yet

17:00 - 18:00
17:00 Track 1
An experience report on strategies for working with Cloud Storage
Tejas Chopra
Netflix
#storageoptimization  #cloud  #architecture 
17:00 Track 2
17:00 Track 3
18:00 - 18:30

Break

18:30 - 19:30
18:30 Track 1
18:30 Track 2
18:30 Track 3
Airflow 2.х SaaS
Mikhail Solodyagin
Tele2
Sergey Yunk
Tele2
Vadim Suhanov
Tele2
#airflow  #k8s  #cloud 
19:30 - 20:00

Break

20:00 - 21:00
20:00 Track 1
How to bring advanced analytics to hybrid data storage with Vertica
Gianluigi Vigano
Vertica
Maurizio Felici
Vertica
Marco Gessner
Vertica
#process  #datavirtualization  #architecture  #database  #storage  #queryengine 
20:00 Track 2
How to design a high-performance distributed SQL engine
Vladimir Ozerov
Querify Labs
#queryoptimization  #queryengine  #tooling 
20:00 Track 3

Day 3. October 13

Time UTC+03:00  & Track
Lecture
Track 1
Track 2
Track 3

No talks in Favorites yet

17:00 - 18:00
17:00 Track 1
17:00 Track 2
Insert into ClickHouse and not die
Artem Shutak
Mail.ru Group
#storage  #dataingestion  #optimization 
17:00 Track 3
18:00 - 18:30

Break

18:30 - 19:30
18:30 Track 1
Dremio SQL Lakehouse: Fast data for all
Viktor Kessler
Dremio
#queryoptimization  #lakehouse  #queryengine  #datalake  #tooling 
18:30 Track 2
18:30 Track 3
19:30 - 20:00

Break

20:00 - 21:00
20:00 Track 1
20:00 Track 2
20:00 Track 3

Day 4. October 14

Time UTC+03:00  & Track
Lecture
Track 1
Track 2
Track 3

No talks in Favorites yet

17:00 - 18:00
17:00 Track 1
17:00 Track 2
17:00 Track 3
18:00 - 18:30

Break

18:30 - 19:30
18:30 Track 1
Create a git-like experience for Data Lake analytics
Itai Admi
Treeverse
#datavirtualisation  #tooling 
18:30 Track 2
18:30 Track 3
19:30 - 20:00

Break

20:00 - 21:00
20:00 Track 1
20:00 Track 2
How we build Feature Store
Sergey Yarymov
MTS
#featurestore  #MLops 
20:00 Track 3
Round table: What if not Hadoop
Nikolay Markov
Aligned Research Group
Maksim Statsenko
Yandex
Natalia Khapaeva
MTS
Nikolay Troshnev
Valdis Pukis
Evolution
#hadoop  #storage  #datalake 
21:00 - 21:15

Conference closing

Despite his education in psychology, for 14 years Pasha managed to work in a lot of IT areas — system administration, development, management, data engineering, in general, touched almost everything that exists in IT. More than 10 years ago he started practicing DevOps and never focused on just one thing. Now Pasha works at JetBrains on Big Data Tools – tools to make data engineer's life easier. Very sociable, loves, and understands people and is always happy to answer any questions.

Maurizio Felici

Vertica Field Chief Technologist. Maurizio has started writing complex code in Fortran in 1985 during his Master’s Degree in Physics when he has built sensors and software to capture and analyse gravitational waves signals. Maurizio has started working in 1986 coding Unix device drivers. In 1992 Maurizio has started working with databases and has implemented his first large Data Warehouse in 1998 when he was in Oracle. In 2006 Maurizio has joined Hewlett-Packard and started working with large MPP databases. In 2011 he begun working with Columnar Databases in Vertica. Maurizio knows several databases, many programming languages and different Data Warehouse Architectures. He has coded several tools in order to: move data from one database to another, assess database throughput and analyse Query Performance. Maurizio has also contributed to the development of the Vertica Federated Queries.

Marco Gessner

Vertica Field Chief Technologist. Worked with relational databases since 1989; with data warehouses since 1992/1993. Worked for Vertica ever since HP bought Vertica in 2011. Specializes in Big Data architectures and data warehousing ecosystems.

Gianluigi Vigano

Gianluigi is a Software Engineer, located in Milan. His expertise lies in Data Architecture with a focus on Information Extraction. He contributes with the R&D team to increase Vertica integration with the opensource ecosystem (Hadoop, Kafka, Spark…). Before joining Vertica, Gianluigi worked in several Information Technology companies, as a System Engineer and Technical Architect for parallel cluster and parallel databases architectures.

An infrastructure-dealing engineer with almost 10 years of software development using various programming languages and platforms. About 8 years of Python programming experience as well as ~3 years of using Go, good knowledge of web technologies.

Teaching, mentoring, writing and translating articles on Python, Linux, Big Data, clouds, networking and algorithms. Expertise includes distributed and high-performanced systems, networking, algorithms, concurrency/parallelism, capacity planning and basic statistical data analysis. DevOps and CI/CD enthusiast.

Maksim Statsenko

"If artificial intelligence is our future, then big data is the coal of the locomotive that will bring us into it".

Maksim is working with data for 10 years. He has been building ETL Pipelines, Data Storages, analyzing Data, and working on Visualisation in government companies (RCOI), energy companies (MOEK, GAZPROM), Banks (BRC, VTB24), and IT companies (Yandex, Mail.Ru). Big Data is his wife and mistress. He's always ready to talk about it.

Nikolay Grebenshchikov

Over 15 years of experience in the IT field. For the last 1.5 years, Nikolay has been developing data storage at Yandex Go. Specializes in MPP Greenplum DBMS.

More than 10 years of experience in IT. Architect of data warehouses and analysis systems at Mail.ru Group and Yandex Go. Candidate of Technical Sciences, author of more than 10 papers in data analysis, co-author of a monograph on the theory and practice of parallel database analysis.

Artem is a Huawei expert in big data technologies and graph databases. Before that, he integrated Spark, TinkerPop, Cassandra at Datastax, led a data storage performance optimization team at EMC, and developed Apache Harmony J2SE.

Nikolay Golov

Nikolay is the Head of Data Engineering of ManyChat (SaaS startup), responsible for the implementation and growth of its Data Platform (AWS+Redis+Snowflake+Tableau). Previously, from 2013 till 2019 he's headed the Data Platform of Avito, Craigslist of Russia, which grew to a multi-billion dollar company from a small startup. In Avito he was responsible for analytical databases (Vertica, ClickHouse), OLTP engines (PostgreSQL, Redis, MongoDB), and data buses (Kafka) for analytics and micro-services integration. In parallel with those jobs, Nikolay is a researcher of Higher School Economics in Moscow, Russia, having few international publications about data warehousing (Anchor Modeling) and aspects of big data processing.

Dmitry Bugaychenko

Graduated from St. Petersburg State University in 2004, got a PhD degree in the field of the formal logical methods in 2007. Spent almost 9 years in outsourcing without losing contact with the university and research community. Big data analysis at Odnoklassniki became for Dmitry an unique chance to combine theoretical knowledge and scientific foundation with the development of real and popular products. And this chance he gladly took advantage of by coming there in 2011. Joined Sberbank team in 2019.

Vladimir Ozerov is the founder of Querify Labs, where he manages the research and development of innovative data management products for technology companies. Before that, Vladimir worked on in-memory data platforms Apache Ignite and Hazelcast for more than eight years, focusing on distributed data processing. Vladimir is a committer to Apache Calcite and Apache Ignite projects.

Tejas Chopra

Tejas Chopra is a Senior Software Engineer, working in the Data Storage Platform team at Netflix, where he is responsible for architecting storage solutions to support Netflix Studios and Netflix Streaming Platform. Tejas has worked on distributed file systems & backend architectures, both in on-premise and cloud environments as part of several startups in his career. Tejas is an International Keynote Speaker and periodically conducts seminars on Micro services, NFTs, Software Development & Cloud Computing and has a Masters Degree in Electrical & Computer Engineering from Carnegie Mellon University, with a specialization in Computer Systems.

Sabir Akhadov

Sabir is a software engineer at Databricks working on optimizing physical data layouts for the best performance. Before that, he worked in Databricks performance engineering and benchmarking team.

Sabir was born in Kazakhstan and since then has lived in 4 different countries. He's interested in learning new languages, technologies, and sports, mostly powerlifting and Russian kettlebells.

After many years in Software Development as a developer, technical lead, DevOps engineer, and architect, Aleks focused on cloud computing and distributed systems. Professional Cloud Architect and Developer Advocate, he shares his knowledge and expertise in the field of high-performant and disaster tolerant systems.

Ash has been a contributor to Airflow for almost four years and is a member of the Project Management Committee (a.k.a. the Core team) for almost as long. He was the Release Manager for much of the 1.10 release series and he also re-wrote much of the Scheduler internals to be highly-available and increase performance by an order of magnitude (AIP-15).

Outside of Airflow he is the Director of Airflow Engineering at Astronomer.io where he runs the team of developers contribute to the open source Airflow project.

Andy Pavlo is an Associate Professor of Databaseology in the Computer Science Department at Carnegie Mellon University. He is also the co-founder of OtterTune.

Jacek is an IT freelancer specializing in Apache Spark, Delta Lake, Apache Kafka and Kafka Streams (with brief forays into a wider data engineering space, e.g. Presto). Jacek offers software development and consultancy services with very hands-on in-depth workshops and mentoring. He is best known by his online books available free of charge at https://books.japila.pl/.

Valerie Wiedemann

Valerie began her career as Pre-Sales Engineer at EXASOL in 2018. At the start of actively technically consulting prospects — future customers of Exasol. Her responsibilities included deep dive into EXASOL's product capabilities and features, preparing testing environments, delivering POCs, and building SOWs for Data Warehouse migrations into EXASOL. The portfolio of customers Valerie worked with some largest insurance and retail organizations in Germany and Central Europe.

Andrey Terekhov

Engineer with over 10 years of hands-on experience in IT. For the past 4 years, Andrey has been dealing with large distributed systems and, in particular, data delivery systems, which he has gradually combined into a universal data delivery service — Yandex DataTransfer.

Graduate of MSU Faculty of Computational Mathematics and Cybernetics. More than 14 years of experience in fintech and telecom as a developer, architect, expert of data governance, and product owner. Now he builds the MLOps platform at MTS.

Nikolay Troshnev

10 years at MTS, in data analytics and numerical marketing, marketing strategy, then headed the functions of data science and data governance, the Big Data team. For 1.5 years as the executive director — chief data scientist (CDS) of Sber, working with distressed assets. For 2 years worked as the leader of the Big Data team of the Social Block of the Moscow Government. Now Nikolay is a private consultant, open to new projects.

Valdis Pukis

Trying to do something useful with data since 1993 as DBA, DBA team lead, DB/DWH developer. Has experienced the ups and downs of different approaches to data processing. Today Valdis is the data processing team lead at Evolution.

Dmitry Ibragimov

If "data is the new oil", then Dmitry is responsible for all steps in working with this it, from well drilling and production to refining and transportation. Dmitry has been building and maintaining data warehouses and data lakes in companies and startups on the Apache technology stack (Hadoop, Hive, Impala, Spark) for the past 8 years. In Leroy Merlin he built a ~500TB storage data platform based on DWH Greenplum, with a lake on top of S3, NiFI, and Flink ETL tools, and an operational layer at Clickhouse. Fan of open source and good dialog partner.

For the last 5 years, Artem has been working in the area of Big Data. In this area, he came across completely different projects from publishing whitepapers on a NoSQL databases benchmark to writing standard pipelines. He works for Profitero as a tech lead of data engineers. In his free time, Artem tries to take part in different open source projects.

Graduated from Moscow Institute of Physics and Technology, moved from physics to the creation of IT products. Supervised products at Gaprombank and Otkrytie. Co-founder of COVI Retail startup. At the moment he is engaged in projects with EDGE computer vision at MTS.

Founder of Infoculture, created to popularize the openness of data, the state, digital preservation, and other related technological public topics. He also develops APICrafter/DataCrafter startup to create catalogs and data lakes, primarily based on open data.

Before that, Ivan created state, private and public information systems and IT products.

Ekaterina Kolpakova

Head of DWH at Citymobil. Developed DWH (BigData) at Tinkoff and Mail.Ru Group. Lecturer of the open course "Designing Big Data Warehouses" at the Mail.Ru Technopark at the BMSTU and MSU.

DataStore Enthusiast, Doodle Maker, Tango Lover & fellow coder.

Currently a senior data engineer at eyeota.com — the world's largest audience data marketplace. Formerly at Flipkart.com — India's largest e-commerce company, was part of its data team, MySQL engineering team, website & warehouse/order management teams.

Christian Langmayr heads the development of the global Exasol Community with End Customers, Academics, Partners, and technology Alliances. He is passionate about keeping and growing the special spirit that goes beyond the software developed and strives for positive interactions between all parties to drive the development of individuals involved. He has more than 15 years of experience in the IT industry with previous positions in MicroStrategy and Toshiba. Christian holds a degree in Business Administration from the Catholic University in Eichstätt, specializing in Services Management and Marketing. His focus is on supporting business growth, improving processes, and developing a data analytics ecosystem that empowers Exasol to grow in its relevant markets.

Evgeny Nikolaev

Graduated from MSU Faculty of Computational Mathematics and Cybernetics in 2015. For more than 6 years he worked as a programmer, for more than 3 years he has been managing teams. Now he's a lead of DWH unit in Avito. A fan of cool products and implements the DWH strategy as a product. In his free time, Evgeny plays football (captain of Avito team), chess, and learns Spanish (B2).

Sergey Yarymov

Data Engineer at MTS Big Data and a lead of data platform development unit. Built an ETL platform of internal fintech stream, took part in the development of BDaaS (Big Data as a Service) product and MTS Big Data ETL Framework. Currently developing Feature Store.

Ton is an Engineer passionate about Machine Learning and AI. Before joining Synthesized, he worked for a challenger bank in the UK improving their decision process by exploiting their data, and before that, he obtained his MSc in Artificial Intelligence at the University of Edinburgh.

Nikolay Valiotti

PhD in Economics, worked in major Russian companies: built analytics at the Lenta network, was responsible for analytical processes at Yota, did forecasting at Baltika, headed the analytics department and then the marketing department at Yulmart, headed the Data & BI direction at US company Airpush. In 2019, he founded Valiotti Analytics, where he provides analytics consulting for mobile and digital startups. Co-founder of the open source self-service BI platform Mprove. Author of the blog leftjoin.ru.

Kirill Rybachuk

8 years in the machine learning industry, 4 years in developing computer vision systems at Cherry Labs. Interested in building ML pipelines, optimizing models, making stuff automated and flexible, for the needs of both production and research.

Dmitry Zuev

Dmitry has been developing in Scala since 2014, developing everything from simple CRUD APIs to stateful distributed services. In recent years he has been working on DE and developing different kinds of DE tools.

Vadim Suhanov

For the last year, Vadim has been working in the Big Data team at Tele2: he makes pipelines, develops internal frameworks, and starts contributing to Airflow. Before he worked at Cian as lead developer, stood at the origins of its rapid development, and was engaged in the development of many features that exist on the site.

Sergey Yunk

Sergey has over 5 years of experience in DevOps and SRE. Previously, he was involved in the development of Observability and IaaC directions within the TK Center. Now he's helping in the development of his own Hadoop distribution at Tele2. He is also actively developing the SaaS approach in the BigData sphere.

Mikhail Solodyagin

For more than 6 years, Mikhail has been implementing DevOps practices and ubiquitous automation. He's one of the developers of SaaS cloud Bit.Live, he also successfully defeated the ancient manual monolith "TK Center", moving it onto comfortable IaaC rails. Now Mikhail is a part of the Hadoop distribution development at Tele2 and is also involved in the development of SaaS/PaaS solutions in the BigData team.

Denis Efarov

Denis works in BigData, mostly with Hadoop since 2013 and now he's a lead developer at Mail.ru Group. He has been designing and developing a platform for storing and processing statistical data for the Odnoklassniki project since 2018.

Artem Shutak

IT engineer and architect with 10 years of experience. For the last 7 years, he has been working on distributed systems in general and Big Data in particular. Now Artem is a lead developer at Mail.ru Group/OK.RU, Data Platform team. Worked with data at Grid Dynamics for 4 years and has gone from Data Engineer to Data Architect role. Also used to be a full-time Apache Ignite contributor, that is why Artem knows how distributed systems work under the hood.

Roman Kondakov

Roman was involved in building the distributed SQL for Apache Ignite at Gridgain Systems. Then he worked at Yandex, where he was engaged in Yandex Query Language. Now he works at Querify Labs that advise technology companies on database development.

Itai Admi

Itai is an R&D team leader at Treeverse, the company behind open source lakeFS. He thrives on finding creative solutions for complex problems, especially if it involves code. Previously, Itai worked at Microsoft and Ridge on data infrastructure, tooling, and performance. Itai received his B.S. degree in Computer Science and an MBA from Tel Aviv University.

Dmitry Anoshin

Analytics and Data Engineer Leader with 10+ years of experience working in Business Intelligence, Data Warehouse & Data Integration, BigData, Cloud, and ML space across North America and Europe.

Apart from work, Dmitry is teaching a Cloud Computing course at the University of Victoria, mentoring high school students at CS faculty, and volunteering my time for coaching people with analytics engineering skills in the CIS region. Moreover, he's the author of analytics books and a speaker at data-related conferences and user groups.

Viktor Kessler is Sr. Solutions Architect at Dremio since December 2019. Before joining Dremio spent multiple years working at MongoDB, ERGO, and PwC as a Solutions Architect on topics of Big Data, DW, and digital transformation projects.