The Marginal Value of Momentum for Small Learning Rate SGD.

Runzhe Wang Sadhika Malladi Tianhao Wang Kaifeng Lyu Zhiyuan Li

Published in: ICLR (2024)

Keyphrases

learning rate
convergence rate
learning algorithm
hidden layer
training speed
error function
adaptive learning rate
convergence speed
rapid convergence
weight vector
training algorithm
activation function
bp neural network algorithm
small number
stochastic gradient descent
multilayer neural networks
training data
convergence theorem
genetic algorithm
neural network