Non-convergence of stochastic gradient descent in the training of deep neural networks.
Patrick CheriditoArnulf JentzenFlorian RossmannekPublished in: J. Complex. (2021)
Keyphrases
- stochastic gradient descent
- neural network
- step size
- early stopping
- least squares
- loss function
- matrix factorization
- training speed
- convergence rate
- random forests
- support vector machine
- training process
- weight vector
- training algorithm
- convergence speed
- regularization parameter
- online algorithms
- genetic algorithm
- linear svm
- multiple kernel learning
- back propagation
- cross validation
- online learning