Login / Signup
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima.
Zeke Xie
Issei Sato
Masashi Sugiyama
Published in:
ICLR (2021)
Keyphrases
</>
deep learning
stochastic gradient descent
least squares
loss function
machine learning
unsupervised learning
matrix factorization
step size
mental models
random forests
decision trees
weakly supervised
multiscale
reinforcement learning
small number