Fast Mixing of Stochastic Gradient Descent with Normalization and Weight Decay.

Zhiyuan Li Tianhao Wang Dingli Yu

Published in: NeurIPS (2022)

Keyphrases

stochastic gradient descent
least squares
loss function
matrix factorization
step size
random forests
support vector machine
online algorithms
importance sampling
logistic regression
regularization parameter
weight vector
decision trees
multiple kernel learning
convergence speed
benchmark datasets
cost function
evolutionary algorithm