TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition.
Lizhi XiangMiao YinChengming ZhangAravind Sukumaran-RajamP. SadayappanBo YuanDingwen TaoPublished in: PPoPP (2023)
Keyphrases
- parallel architectures
- low cost
- general purpose
- cost effective
- cellular neural networks
- database systems
- high order
- real time
- graphics hardware
- computational power
- hardware and software
- graphics processors
- vlsi implementation
- decomposition algorithm
- graphics processing units
- highly efficient
- matrix factorization
- efficient implementation
- data structure
- neural network