Login / Signup

SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills.

Amey AgrawalAshish PanwarJayashree MohanNipun KwatraBhargav S. GulavaniRamachandran Ramjee
Published in: CoRR (2023)
Keyphrases
  • computationally efficient
  • knowledge base
  • lightweight
  • probabilistic reasoning
  • database
  • case study
  • video sequences
  • lower bound
  • multiresolution
  • cost effective
  • probabilistic inference
  • bayesian inference