SGD Generalizes Better Than GD (And Regularization Doesn't Help).

Idan Amir Tomer Koren Roi Livni

Published in: COLT (2021)

Keyphrases

stochastic gradient descent
matrix factorization
loss function
least squares
step size
regularization parameter
parameter selection
machine learning
worst case
random forests
regularization term
regularization method
regularization methods
projection operator