On Avoiding Local Minima Using Gradient Descent With Large Learning Rates.

Amirkeivan Mohtashami Martin Jaggi Sebastian U. Stich

Published in: CoRR (2022)

Keyphrases

learning rate
error function
update rule
learning algorithm
convergence rate
cost function
global minimum
uniform convergence
convergence speed
covering numbers
convergence theorem
gaussian kernels
objective function
loss function
simulated annealing
weight vector
upper bound
lower bound