Login / Signup

RazorAttention: Efficient KV Cache Compression Through Retrieval Heads.

Hanlin TangYang LinJing LinQingsen HanShikuan HongYiwu YaoGongyi Wang
Published in: CoRR (2024)
Keyphrases