Login / Signup

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve.

Amey AgrawalNitin KediaAshish PanwarJayashree MohanNipun KwatraBhargav S. GulavaniAlexey TumanovRamachandran Ramjee
Published in: CoRR (2024)
Keyphrases