SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills.

Published in: CoRR (2023)

Keyphrases