Login / Signup

Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference.

Hao Mark ChenWayne LukKa Fai Cedric YiuRui LiKonstantin MishchenkoStylianos I. VenierisHongxiang Fan
Published in: CoRR (2024)
Keyphrases