Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference.

Published in: CoRR (2024)

Keyphrases