Login / Signup

ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition.

Lu YeZe TaoYong HuangYang Li
Published in: CoRR (2024)
Keyphrases
  • computationally efficient
  • cost effective
  • data structure
  • computationally expensive
  • learning algorithm
  • multi dimensional
  • lightweight