Optimizing Massively Parallel Winograd Convolution on ARM Processor.
Dongsheng LiDan HuangZhiguang ChenYutong LuPublished in: ICPP (2021)
Keyphrases
- massively parallel
- mesh connected
- parallel architectures
- processing elements
- parallel computing
- fine grained
- high performance computing
- computer architecture
- parallel computers
- parallel machines
- high speed
- image processing
- floating point unit
- graphics processing units
- distributed memory
- parallel processing
- parallel programming
- parallel execution