Stream-K: Work-Centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU.
Muhammad OsamaDuane MerrillCris CeckaMichael GarlandJohn D. OwensPublished in: PPoPP (2023)
Keyphrases
- matrix multiplication
- distributed memory
- parallel implementation
- shared memory
- real time
- parallel computation
- message passing
- parallel computing
- parallel programming
- parallel processing
- cluster of workstations
- graphics processing units
- parallel hardware
- data streams
- matrix factorization
- parallel architectures
- image processing
- parallel algorithm
- decomposition method
- graphics hardware
- parallel machines
- computer architecture
- multi view
- lower bound
- high quality