Login / Signup
SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills.
Amey Agrawal
Ashish Panwar
Jayashree Mohan
Nipun Kwatra
Bhargav S. Gulavani
Ramachandran Ramjee
Published in:
CoRR (2023)
Keyphrases
</>
computationally efficient
knowledge base
lightweight
probabilistic reasoning
database
case study
video sequences
lower bound
multiresolution
cost effective
probabilistic inference
bayesian inference