SGD Generalizes Better Than GD (And Regularization Doesn't Help).
Idan AmirTomer KorenRoi LivniPublished in: CoRR (2021)
Keyphrases
- stochastic gradient descent
- loss function
- regularization parameter
- matrix factorization
- support vector machine
- least squares
- model selection
- step size
- projection operator
- genetic algorithm
- smoothing parameter
- empirical risk minimization
- regularization methods
- regularization framework
- online algorithms
- multiple kernel learning
- data dependent
- random forests
- maximum likelihood
- feature selection