Login / Signup
Beyond KV Caching: Shared Attention for Efficient LLMs.
Bingli Liao
Danilo Vasconcellos Vargas
Published in:
CoRR (2024)
Keyphrases
</>
neural network
computationally efficient
computationally expensive
information retrieval
social networks
knowledge base
web applications
visual attention