NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention.

Published in: CoRR (2024)

Keyphrases