Sparse Attention Acceleration with Synergistic In-Memory Pruning and On-Chip Recomputation.
Amir YazdanbakhshAshkan MoradifirouzabadiZheng LiMingu KangPublished in: MICRO (2022)
Keyphrases
- memory subsystem
- low cost
- high speed
- search space
- high density
- sparse data
- high dimensional
- main memory
- memory requirements
- random access memory
- pruning method
- speculative execution
- analog vlsi
- level parallelism
- sparse representation
- multithreading
- single chip
- vlsi implementation
- short term memory
- pruning methods
- computing power
- focus of attention
- visual attention
- neural network