Data Lineage: How to Set It Up in a Technology Zoo and Why It Is Needed

Data Lineage is an essential tool for tracking the path of data from its source to its final use, encompassing all stages of data transformation and movement.

In this talk, we will discuss how to set up Data Lineage using DataHub for heterogeneous sources and the benefits it brings to data engineers and businesses: making informed decisions, reducing operational risks, and increasing trust in data.

We will cover the steps for integrating DataHub with various systems such as Kafka, Trino, dbt, Airflow, PostgreSQL, ClickHouse, S3, OpenAPI, Feast, Tableau, Metabase, and more. Additionally, we will briefly review real-life examples of successful implementations (solving TTM loss issues and improving coordination between teams).

The talk will be valuable for both technical specialists and managers interested in enhancing data quality and business process transparency.