Login / Signup
Accelerating CPU-based Distributed DNN Training on Modern HPC Clusters using BlueField-2 DPUs.
Arpan Jain
Nawras Alnaasan
Aamir Shafi
Hari Subramoni
Dhabaleswar K. Panda
Published in:
HOTI (2021)
Keyphrases
</>
training process
distributed systems
fault tolerance
clustering algorithm
high performance computing
cooperative
fault tolerant
distributed environment
training algorithm
training examples
data clustering
neural network
distributed data
communication cost
mobile agents
training samples
training set
data distribution
data points
hierarchical clustering
online learning
fuzzy clustering
input data
multi agent
peer to peer
subspace clustering
training phase
general purpose