Talk

ML Inference Neural Network Services in Yandex Advertising

In Russian
Presentation pdf

How to create efficient neural network inference services on the scale of tens of thousands of cores and hundreds of GPUs for a dozen of customers.

The talk is aimed at those who: are engaged in MLOps, ML Inference; are interested in how inference services look like in Yandex Advertising; have built large systems of services that are constrained by CPU and mem; like to develop their services in C++ and invest in efficiency and optimisations.

Speakers

Invited experts

Schedule