PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation.

Branden ButlerSixing YuArya MazaheriAli Jannesari
Published in: CoRR (2024)
Keyphrases