Login / Signup
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads.
Hanlin Tang
Yang Lin
Jing Lin
Qingsen Han
Shikuan Hong
Yiwu Yao
Gongyi Wang
Published in:
CoRR (2024)
Keyphrases
</>
image compression
efficient indexing
document retrieval
query processing
data retrieval
retrieval accuracy
relevance feedback
information retrieval systems
retrieval systems
coding scheme
retrieval model
r tree
highly scalable
efficient search
semantic caching