• search
    search
  • reviewers
    reviewers
  • feeds
    feeds
  • assignments
    assignments
  • settings
  • logout

SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills.

Amey AgrawalAshish PanwarJayashree MohanNipun KwatraBhargav S. GulavaniRamachandran Ramjee
Published in: CoRR (2023)
Keyphrases
  • computationally efficient
  • knowledge base
  • lightweight
  • probabilistic reasoning
  • database
  • case study
  • video sequences
  • lower bound
  • multiresolution
  • cost effective
  • probabilistic inference
  • bayesian inference