SparQ Attention: Bandwidth-Efficient LLM Inference.

Luka Ribar Ivan Chelombiev Luke Hudlass-Galley Charlie Blake Carlo Luschi Douglas Orr

Published in: CoRR (2023)

Keyphrases

computationally efficient
databases
focus of attention
neural network
data mining
information systems
data structure
cost effective
visual attention
efficient learning