A Portable and High-Performance General Matrix-Multiply (GEMM) Library for GPUs and Single-Chip CPU/GPU Systems.
Rahul GargLaurie J. HendrenPublished in: PDP (2014)
Keyphrases
- graphics processing units
- highly parallel
- single chip
- general purpose
- floating point
- parallel processing
- gpu implementation
- computing systems
- graphics hardware
- parallel computing
- embedded processors
- parallel programming
- cpu implementation
- graphics processors
- software implementation
- scientific computing
- commodity hardware
- parallel implementation
- compute unified device architecture
- real time
- signal processor
- parallel computation
- massively parallel
- low power
- parallel architectures
- efficient implementation
- computer systems
- high performance computing
- heterogeneous computing
- multimedia