Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent.

Xiaowu Dai Yuhua Zhu

Published in: CoRR (2018)

Keyphrases

stochastic gradient descent
online algorithms
early stopping
loss function
least squares
matrix factorization
step size
random forests
online learning
support vector machine
worst case
regularization parameter
multiple kernel learning
weight vector
average case
logistic regression
e learning
importance sampling
cost function