High-Performance Matrix Multiply on a Massively Multithreaded Fiteng1000 Processor.
Jie LiuLihua ChiChunye GongHan XuJie JiangYihui YanQingfeng HuPublished in: ICA3PP (2) (2012)
Keyphrases
- distributed memory
- parallel computing
- shared memory
- multithreading
- scientific computing
- computation intensive
- parallel implementation
- massively parallel
- computer architecture
- highly parallel
- linear algebra
- singular value decomposition
- embedded processors
- multiprocessor systems
- multi user
- message passing
- parallel processing
- real time
- functional verification
- parallel architecture
- single chip
- singular values
- low rank
- high speed