Talk

LLM Ops: Optimization of Inference and ML-serving in a Real Production Cluster

In Russian

The talk is about practical experience in optimizing inference and ML-serving based on GPUStack in the production environment of the corporate AI Portal.

Speakers

Talks