Login / Signup
Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio.
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos J. Storkey
Published in:
ICANN (3) (2018)
Keyphrases
</>
learning rate
stochastic gradient descent
weight vector
convergence rate
step size
learning algorithm
convergence speed
least squares
loss function
matrix factorization
online algorithms
support vector machine
graphical models