Mapping Parallel Matrix Multiplication in GotoBLAS2 to the AMD Versal ACAP for Deep Learning.
Jie LeiEnrique S. Quintana-OrtíPublished in: CoRR (2024)
Keyphrases
- deep learning
- matrix multiplication
- distributed memory
- parallel implementation
- shared memory
- unsupervised feature learning
- unsupervised learning
- machine learning
- message passing
- mental models
- deep architectures
- matrix factorization
- parallel computing
- parallel machines
- weakly supervised
- text mining
- training set
- learning algorithm