Sign in

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference.

Harry DongXinyu YangZhenyu ZhangZhangyang WangYuejie ChiBeidi Chen
Published in: CoRR (2024)
Keyphrases
  • cost effective
  • compression algorithm
  • data compression
  • neural network
  • learning algorithm
  • bayesian networks
  • multiscale
  • multiresolution
  • data access
  • probabilistic inference
  • efficient learning