Login / Signup

Beyond KV Caching: Shared Attention for Efficient LLMs.

Bingli LiaoDanilo Vasconcellos Vargas
Published in: CoRR (2024)
Keyphrases
  • neural network
  • computationally efficient
  • computationally expensive
  • information retrieval
  • social networks
  • knowledge base
  • web applications
  • visual attention