Login / Signup

RelayAttention for Efficient Large Language Model Serving with Long System Prompts.

Lei ZhuXinjiang WangWayne ZhangRynson W. H. Lau
Published in: CoRR (2024)
Keyphrases