DaSGD: Squeezing SGD Parallelization Performance in Distributed Training Using Delayed Averaging.
Qinggang ZhouYawen ZhangPengcheng LiXiaoyong LiuJun YangRunsheng WangRu HuangPublished in: CoRR (2020)
Keyphrases
- stochastic gradient descent
- lightweight
- training set
- distributed systems
- computer networks
- cooperative
- parallel execution
- hidden markov models
- least squares
- supervised learning
- distributed environment
- worst case
- training speed
- shared memory
- mobile agents
- loss function
- neural network
- wireless sensor networks
- optical flow
- support vector
- web services