Talk

The MDM That Stores Nothing: How to Match Data Without Centralizing It

In Russian

MDM systems are usually built around the idea of a single center for master data. But in real corporate and government landscapes, it is not always possible to simply copy data into a separate environment: regulatory constraints, security requirements, and distributed data ownership get in the way.

In this talk, I will explain how we built an MDM system that matches data but does not store it within its own perimeter. The main focus will be on the evolution of the matching algorithm, which changed several times over the course of the project, mostly in favor of speed.

We will discuss how we developed the matching rules, how we addressed the fact that people do change last names — and not only last names — and how to assemble a more complete golden record from several records using transitive matching.

The key question of the talk is: can an MDM system match master data without turning into that leaky jar of data that is bound to spill sooner or later?

Speakers

Talks