Login / Signup
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference.
Harry Dong
Xinyu Yang
Zhenyu Zhang
Zhangyang Wang
Yuejie Chi
Beidi Chen
Published in:
CoRR (2024)
Keyphrases
</>
cost effective
compression algorithm
data compression
neural network
learning algorithm
bayesian networks
multiscale
multiresolution
data access
probabilistic inference
efficient learning