Non-convergence of Adam and other adaptive stochastic gradient descent optimization methods for non-vanishing learning rates.

Published in: CoRR (2024)

Keyphrases