C
search
search
reviewers
reviewers
feeds
feeds
assignments
assignments
settings
logout
SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills.
Amey Agrawal
Ashish Panwar
Jayashree Mohan
Nipun Kwatra
Bhargav S. Gulavani
Ramachandran Ramjee
Published in:
CoRR (2023)
Keyphrases
</>
computationally efficient
knowledge base
lightweight
probabilistic reasoning
database
case study
video sequences
lower bound
multiresolution
cost effective
probabilistic inference
bayesian inference