Login / Signup
TSM2X: High-performance tall-and-skinny matrix-matrix multiplication on GPUs.
Cody Rivera
Jieyang Chen
Nan Xiong
Jing Zhang
Shuaiwen Leon Song
Dingwen Tao
Published in:
J. Parallel Distributed Comput. (2021)
Keyphrases
</>
matrix multiplication
distributed memory
message passing
graphics processing units
matrix factorization
shared memory
parallel implementation
scientific computing
parallel programming
parallel processing
parallel computers
lower bound
markov random field
highly parallel