The Marginal Value of Momentum for Small Learning Rate SGD.
Runzhe WangSadhika MalladiTianhao WangKaifeng LyuZhiyuan LiPublished in: CoRR (2023)
Keyphrases
- learning rate
- learning algorithm
- convergence rate
- error function
- convergence speed
- hidden layer
- training speed
- weight vector
- multilayer neural networks
- stochastic gradient descent
- rapid convergence
- adaptive learning rate
- training algorithm
- delta bar delta
- convergence theorem
- activation function
- small number
- high accuracy
- bp neural network algorithm