Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve.

Amey Agrawal Nitin Kedia Ashish Panwar Jayashree Mohan Nipun Kwatra Bhargav S. Gulavani Alexey Tumanov Ramachandran Ramjee

Published in: OSDI (2024)