Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning.

Hao Yu Sen Yang Shenghuo Zhu

Published in: AAAI (2019)

Keyphrases

deep learning
faster convergence
model averaging
step size
unsupervised learning
machine learning
convergence speed
pso algorithm
supervised classification
weakly supervised
global optimization
global optimum
mental models
bayesian methods
convergence rate
posterior distribution
learning algorithm
bayesian network structures
particle swarm optimization
active learning
search space
co occurrence