Beyond KV Caching: Shared Attention for Efficient LLMs.

Bingli Liao Danilo Vasconcellos Vargas

Published in: CoRR (2024)

Keyphrases

neural network
computationally efficient
computationally expensive
information retrieval
social networks
knowledge base
web applications
visual attention