Delta Lake data layout optimization

На английском языке

In this talk, Sabir will walk you through physical data layout optimizations available with Delta Lake. In talk will discuss factors that make a query execute fast. He'll then outline different ways users can make optimize their workloads by making sure their data is organized in the best way possible. In particular, this talk will look at data partitioning, bucketing, and Z-order. It will discuss factors such as data clustering, statistics, optimal file sizes, and parquet row group sizes. Finally, Sabir will give you a sneak peek at the things the team is currently working on at Databricks to push the performance to the next level.

#storage
#storageoptimization

Спикеры

Sabir Akhadov
Databricks Inc

Приглашенные эксперты

Сергей Бойцов
JetBrains

Расписание

Delta Lake data layout optimization

Спикеры

Sabir Akhadov

Приглашенные эксперты

Сергей Бойцов