High performance and memory efficient implementation of matrix multiplication on FPGAs.
Guiming WuYong DouMiao WangPublished in: FPT (2010)
Keyphrases
- efficient implementation
- matrix multiplication
- distributed memory
- hardware implementation
- highly parallel
- parallel architectures
- message passing
- efficient processing
- field programmable gate array
- shared memory
- parallel computers
- graphics processing units
- image processing
- distributed systems
- active set
- parallel implementation
- matrix factorization
- computer vision