Non-convergence of Adam and other adaptive stochastic gradient descent optimization methods for non-vanishing learning rates.
Steffen DereichRobin GraeberArnulf JentzenPublished in: CoRR (2024)
Keyphrases
- optimization methods
- learning rate
- stochastic gradient descent
- convergence rate
- step size
- global convergence
- weight vector
- convergence speed
- optimization method
- simulated annealing
- optimization problems
- least squares
- matrix factorization
- learning algorithm
- loss function
- global optimization
- optimization algorithm
- particle swarm optimization
- active learning