LLM Ops: Optimization of Inference and ML-serving in a Real Production Cluster
In RussianRU
LLM Ops: Optimization of Inference and ML-serving in a Real Production Cluster
The talk is about practical experience in optimizing inference and ML-serving based on GPUStack in the production environment of the corporate AI Portal.