Apache Spark Extensions for PySpark Integration Testing
How we solved the problem of making hotfix changes to ETL pipelines on Apache Spark in hundreds of existing processes without changing their code.

Ilya Kochagin
МТS Web Services (MWS)
New talks are published weekly. Follow updates or secure your ticket early.
How we solved the problem of making hotfix changes to ETL pipelines on Apache Spark in hundreds of existing processes without changing their code.
МТS Web Services (MWS)
Our Trino storage hit the performance ceiling of a single Ceph cluster — so we started spreading every table across several clusters at once, hiding all the sharding logic in the HAProxy sidecars on our compute nodes, without adding a single new component to the architecture. Reads sped up from 20 to 60–80 GB/s, and GET latency dropped from minutes to 1–2 seconds.
Avito
Let's talk about testing, finding and debugging problems in highly loaded software, as well as support for storage with third-party vendor solutions.
YADRO
In this talk, I will analyze a practical approach to measuring self-hosted LLM performance.
Cian
In the talk, I will review the current state of the data transformation ecosystem, as well as alternative tools and promising projects that may replace dbt.
Positive Technologies
Let's talk about what important functions are needed to manage Iceberg tables and the role of REST Catalog in this.
Ostrovok!
The talk is about practical experience in optimizing inference and ML-serving based on GPUStack in the production environment of the corporate AI Portal.
Lemana PRO (Leroy Merlin)
How pgvector works: vector storage, HNSW and IVFFlat algorithms, performance degradation points. An honest breakdown of where the solution holds up and where it doesn't.
Postgres Pro
The talk shows pitfalls that prevent the widespread use of sketches by final analysts.