TalkDate: 11.10 / Start: 00:00 – Finish: 00:00

Data catalog and data lake based on MongoDB: Building tech stack from scratch

In RussianComplexity -

The problem: cataloging a large number of unmanaged data sources. The audience: data engineers, data analysts, data solutions developers, data solution architects.

Ivan's talk will be about the work on creating a DataCrafter data catalog based on MongoDB, based on large heterogeneous public data of complex formats from unmanaged sources.

The catalog includes such rarely implemented features as:

automatic data schema creation;
automatic classification/identification of gender types (cadastral numbers, email, company IDs, links, etc.);
automated documentation;
automatic data quality assessment.

The focus of the talk will be on experiments preceding the creation of the catalog, technology stacks, problems being solved, and limitations.

#datacatalog
#datagovernance

Speakers

Ivan Begtin
Infoculture

Invited experts

Kseniya Tomak
Dodo Engineering

Schedule

Data catalog and data lake based on MongoDB: Building tech stack from scratch

Speakers

Ivan Begtin

Invited experts

Kseniya Tomak