Automatic Generation of Distributed-Memory Mappings for Tensor Computations.
Martin KongRaneem Abu YosefAtanas RountevP. SadayappanPublished in: SC (2023)
Keyphrases
- distributed memory
- matrix multiplication
- shared memory
- fine grain
- parallel implementation
- multiprocessor systems
- ibm sp
- high order
- scientific computing
- parallel computers
- data parallelism
- parallel architecture
- parallel machines
- multi processor
- higher order
- pairwise
- parallel computation
- multithreading
- message passing
- stereo matching