A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication.
Ramesh C. AgarwalFred G. GustavsonMohammad ZubairPublished in: IBM J. Res. Dev. (1994)
Keyphrases
- distributed memory
- matrix multiplication
- parallel implementation
- ibm sp
- parallel computers
- shared memory
- parallel machines
- optimal solution
- parallel architecture
- parallel processing
- hardware implementation
- matching algorithm
- computer architecture
- energy function
- dynamic programming
- high resolution
- computational complexity