Non-convergence of stochastic gradient descent in the training of deep neural networks.
Patrick CheriditoArnulf JentzenFlorian RossmannekPublished in: CoRR (2020)
Keyphrases
- stochastic gradient descent
- neural network
- least squares
- loss function
- step size
- early stopping
- matrix factorization
- training speed
- convergence rate
- random forests
- training algorithm
- training process
- convergence speed
- support vector machine
- weight vector
- regularization parameter
- back propagation
- multiple kernel learning
- importance sampling
- online algorithms
- linear svm
- supervised learning
- iterative algorithms
- support vector
- collaborative filtering
- training data
- learning rate
- logistic regression
- online learning