Skip to contentRU

TalkDate: 23.09 / Start: 00:00 – Finish: 00:00

LLM Ops: Optimization of Inference and ML-serving in a Real Production Cluster

ML/LLMOps

In Russian

The talk is about practical experience in optimizing inference and ML-serving based on GPUStack in the production environment of the corporate AI Portal.

Speakers

Dmitry Ibragimov
Lemana Tech

Other talks on «ML/LLMOps»