Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation.
Ammar Ahmad AwanJeroen BédorfChing-Hsiang ChuHari SubramoniDhabaleswar K. PandaPublished in: CCGRID (2019)
Keyphrases
- scalable distributed
- parallel implementation
- training process
- general purpose
- parallel computing
- shared memory
- neural network
- training set
- parallel programming
- training phase
- real time
- parallelization strategy
- file system
- parallel algorithm
- online learning
- training algorithm
- message passing
- parallel computation
- distributed systems
- message passing interface
- supervised learning
- data sets