SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation.
Robert M. GowerOthmane SebbouhNicolas LoizouPublished in: AISTATS (2021)
Keyphrases
- learning rate
- convergence rate
- covering numbers
- learning algorithm
- gaussian kernels
- uniform convergence
- weight vector
- stochastic gradient descent
- vc dimension
- convex optimization
- learning theory
- convergence speed
- global optimization
- special case
- particle swarm optimization
- upper bound
- evolutionary algorithm
- feature selection