A parallel block implementation of Level-3 BLAS for MIMD vector processors.
Michel J. DaydéIain S. DuffAntoine PetitetPublished in: ACM Trans. Math. Softw. (1994)
Keyphrases
- shared memory
- distributed memory
- parallel algorithm
- parallel architecture
- parallel computers
- parallel processing
- parallel implementation
- cluster of workstations
- parallel version
- parallel programming
- parallel computation
- parallel computing
- coarse grained
- highly optimized
- scientific computing
- higher level
- multiprocessor systems
- highly parallel
- high end
- single processor
- general purpose
- levels of abstraction
- message passing
- parallel processors
- multi core processors
- data parallelism
- depth first search
- message passing interface
- shared memory multiprocessor
- processor array
- computer architecture
- parallel machines
- single instruction multiple data
- feature vectors
- image processing
- processing elements
- video sequences
- load balancing
- linear algebra
- signal processing
- efficient implementation