Login / Signup
CHAI: Clustered Head Attention for Efficient LLM Inference.
Saurabh Agarwal
Bilge Acun
Basil Hosmer
Mostafa Elhoushi
Yejin Lee
Shivaram Venkataraman
Dimitris Papailiopoulos
Carole-Jean Wu
Published in:
CoRR (2024)
Keyphrases
</>
real time
efficient learning
data mining
learning algorithm
generative model
computationally efficient
pose estimation
inference engine
focus of attention