The RISC BLAS: a blocked implementation of level 3 BLAS for RISC processors.
Michel J. DaydéIain S. DuffPublished in: ACM Trans. Math. Softw. (1999)
Keyphrases
- instruction set
- highly optimized
- application specific
- linear algebra
- computer architecture
- general purpose
- floating point
- hardware architecture
- embedded systems
- connected component labeling
- parallel processing
- special purpose
- computation intensive
- scientific computing
- efficient implementation
- parallel algorithm
- multiresolution
- lower level
- higher level
- high speed
- data sets