Stream-K: Work-centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU.
Muhammad OsamaDuane MerrillCris CeckaMichael GarlandJohn D. OwensPublished in: CoRR (2023)
Keyphrases
- matrix multiplication
- distributed memory
- parallel implementation
- shared memory
- message passing
- parallel computation
- real time
- parallel programming
- parallel computing
- parallel processing
- graphics processing units
- data streams
- parallel algorithm
- cluster of workstations
- matrix factorization
- parallel machines
- graphics hardware
- parallel hardware
- parallel architectures
- three dimensional
- computing systems
- decomposition method
- higher order
- d objects
- bayesian networks
- high quality