Login / Signup

CHAI: Clustered Head Attention for Efficient LLM Inference.

Saurabh AgarwalBilge AcunBasil HosmerMostafa ElhoushiYejin LeeShivaram VenkataramanDimitris PapailiopoulosCarole-Jean Wu
Published in: CoRR (2024)
Keyphrases
  • real time
  • efficient learning
  • data mining
  • learning algorithm
  • generative model
  • computationally efficient
  • pose estimation
  • inference engine
  • focus of attention