Achieving Native GPU Performance for Out-of-Card Large Dense Matrix Multiplication.

Jing Wu Joseph F. JáJá

Published in: Parallel Process. Lett. (2016)

Keyphrases

matrix multiplication
message passing
distributed memory
real time
smart card
parallel implementation
low cost
graphics hardware
gpu implementation
parallel processing
preprocessing
semi supervised
matrix factorization
parallel computing
graphics processors
gpu accelerated