Login / Signup

NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention.

Tianyi ZhangJonah Wonkyu YiBowen YaoZhaozhuo XuAnshumali Shrivastava
Published in: CoRR (2024)
Keyphrases
  • visual attention
  • focus of attention
  • computer vision
  • search algorithm
  • neural network
  • information systems
  • decision making
  • bayesian networks
  • computationally efficient
  • computationally expensive
  • efficient learning