Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent.
Xiaowu DaiYuhua ZhuPublished in: CoRR (2018)
Keyphrases
- stochastic gradient descent
- online algorithms
- early stopping
- loss function
- least squares
- matrix factorization
- step size
- random forests
- online learning
- support vector machine
- worst case
- regularization parameter
- multiple kernel learning
- weight vector
- average case
- logistic regression
- e learning
- importance sampling
- cost function