Sparse Attention Acceleration with Synergistic In-Memory Pruning and On-Chip Recomputation.

Amir Yazdanbakhsh Ashkan Moradifirouzabadi Zheng Li Mingu Kang

Published in: MICRO (2022)

Keyphrases

memory subsystem
low cost
high speed
search space
high density
sparse data
high dimensional
main memory
memory requirements
random access memory
pruning method
speculative execution
analog vlsi
level parallelism
sparse representation
multithreading
single chip
vlsi implementation
short term memory
pruning methods
computing power
focus of attention
visual attention
neural network