Login / Signup

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention.

William BrandonMayank MishraAniruddha NrusimhaRameswar PandaJonathan Ragan-Kelley
Published in: CoRR (2024)
Keyphrases
  • cross layer
  • application layer
  • wireless networks
  • video streaming
  • mobile ad hoc networks
  • routing protocol
  • multi layer
  • multimedia services
  • computational complexity
  • rate distortion
  • low complexity