Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning.
Hao YuSen YangShenghuo ZhuPublished in: AAAI (2019)
Keyphrases
- deep learning
- faster convergence
- model averaging
- step size
- unsupervised learning
- machine learning
- convergence speed
- pso algorithm
- supervised classification
- weakly supervised
- global optimization
- global optimum
- mental models
- bayesian methods
- convergence rate
- posterior distribution
- learning algorithm
- bayesian network structures
- particle swarm optimization
- active learning
- search space
- co occurrence