Начало основного контента
Delta Lake data layout optimization
In this talk, Sabir will walk you through physical data layout optimizations available with Delta Lake. In talk will discuss factors that make a query execute fast. He'll then outline different ways users can make optimize their workloads by making sure their data is organized in the best way possible. In particular, this talk will look at data partitioning, bucketing, and Z-order. It will discuss factors such as data clustering, statistics, optimal file sizes, and parquet row group sizes. Finally, Sabir will give you a sneak peek at the things the team is currently working on at Databricks to push the performance to the next level.