Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs.
Kazuya MatsumotoNaohito NakasatoStanislav G. SedukhinPublished in: SC Companion (2012)
Keyphrases
- matrix multiplication
- graphics processing units
- parallel programming
- high end
- shared memory
- distributed memory
- general purpose
- multi core systems
- message passing
- parallel processing
- parallel computing
- parallel implementation
- processing units
- parallel execution
- commodity hardware
- gpu implementation
- parallel computation
- massively parallel
- matrix factorization
- graphics processors
- floating point
- real time
- computing systems
- high performance computing
- memory bandwidth
- parallel machines
- oracle database
- parallel algorithm
- efficient implementation
- multi core processors
- parallel architectures
- belief propagation
- hardware design
- memory access
- distributed systems
- message passing interface
- pairwise
- high quality