Sparse Attention Acceleration with Synergistic In-Memory Pruning and On-Chip Recomputation.

Amir Yazdanbakhsh Ashkan Moradifirouzabadi Zheng Li Mingu Kang

Published in: CoRR (2022)

Keyphrases

high speed
random access memory
memory access
low cost
sparse data
vlsi implementation
memory space
memory requirements
memory usage
digital signal processors
single chip
computational power
high dimensional
real time
memory subsystem
compressive sensing
visual attention
processor core
sparse representation
pruning method
avoid overfitting
analog vlsi
search space
pruning power
level parallelism
programmable logic
pruning algorithm
main memory
computing power
focus of attention