Talk

How We Build a Distributed Tracing System in Which You Can Lose Data

  • In Russian
Presentation pdf

Often, data processing and delivery systems have strict reliability requirements: all data must be delivered.

At Avito, we are building a log collection and distributed tracing system that processes over 15 million events per second from over 2 thousand services, and we can lose data!

We'll look at the architecture of our system. I'll talk about tricks we can use because of the lack of strong guarantees. How do we discard data if we don't want to store everything? And how do we figure out what data we need? How do we build data transfers in the face of node and data centre failures? We'll focus on system architecture and its evolution, but we'll touch on the domain of tracing and log collection as well.

Technologies: data processing pipelines on OpenTelemetry, all-time favourites Kafka and ClickHouse (plus their synergy) and probabilistic streaming algorithms.

Speakers

Invited experts

Schedule