Fast Mixing of Stochastic Gradient Descent with Normalization and Weight Decay.
Zhiyuan LiTianhao WangDingli YuPublished in: NeurIPS (2022)
Keyphrases
- stochastic gradient descent
- least squares
- loss function
- matrix factorization
- step size
- random forests
- support vector machine
- online algorithms
- importance sampling
- logistic regression
- regularization parameter
- weight vector
- decision trees
- multiple kernel learning
- convergence speed
- benchmark datasets
- cost function
- evolutionary algorithm