Organizing access to Zalando's Data Lake

День 1 /  / Зал 3  / 

Комментарий Программного комитета:

Взаимодействию с метаданными часто не уделяют должного внимания, но это крайне полезная тема! При построении DWH, а тем более Data Lake, необходимо понимать, где какие данные и в каком качестве лежат. Поэтому мы не могли не обратить на этот доклад внимания.

In this talk Dmitry'd like to make a retrospective on development of a data lake in one of the Europe's biggest ecommerce companies. The topics covered are revolving around organizing access to unorganized data. The talk recaps Dmitry and his team's experience with access management, metadata management, execution engines, visualization tools, data governance, machine learning enablement.

This talk is going to be interesting for the data engineers who have big volumes of unorganized data. The purpose of the talk is to show Dmitry and his team's journey from "We have all the data" to "We can make use of our data".

Technologies they use: Presto, Apache Hive, AWS S3, Apache Superset, Apache Airflow, and JupyterHub.