The Implicit Biases of Stochastic Gradient Descent on Deep Neural Networks with Batch Normalization.
Ziquan LiuYufei CuiJia WanYu MaoAntoni B. ChanPublished in: CoRR (2021)
Keyphrases
- stochastic gradient descent
- neural network
- online algorithms
- step size
- least squares
- matrix factorization
- loss function
- online learning
- random forests
- lower bound
- training process
- support vector machine
- worst case
- multiple kernel learning
- weight vector
- genetic algorithm
- regularization parameter
- importance sampling
- collaborative filtering
- convergence rate
- asymptotically optimal
- learning algorithm