Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?
Ammar Ahmad AwanChing-Hsiang ChuHari SubramoniDhabaleswar K. PandaPublished in: CoRR (2017)
Keyphrases
- deep learning
- parallel implementation
- parallel computing
- unsupervised learning
- clustering algorithm
- unsupervised feature learning
- parallel programming
- machine learning
- message passing
- mental models
- general purpose
- data points
- massively parallel
- deep architectures
- shared memory
- parallel algorithm
- graphics processing units
- weakly supervised