The Marginal Value of Momentum for Small Learning Rate SGD.
Runzhe WangSadhika MalladiTianhao WangKaifeng LyuZhiyuan LiPublished in: ICLR (2024)
Keyphrases
- learning rate
- convergence rate
- learning algorithm
- hidden layer
- training speed
- error function
- adaptive learning rate
- convergence speed
- rapid convergence
- weight vector
- training algorithm
- activation function
- bp neural network algorithm
- small number
- stochastic gradient descent
- multilayer neural networks
- training data
- convergence theorem
- genetic algorithm
- neural network