• search
    search
  • reviewers
    reviewers
  • feeds
    feeds
  • assignments
    assignments
  • settings
  • logout

SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification.

Xupeng MiaoGabriele OliaroZhihao ZhangXinhao ChengZeyu WangRae Ying Yee WongZhuoming ChenDaiyaan ArfeenReyna AbhyankarZhihao Jia
Published in: CoRR (2023)
Keyphrases