Convergence rates and approximation results for SGD and its continuous-time counterpart.
Xavier FontaineValentin De BortoliAlain DurmusPublished in: COLT (2021)
Keyphrases
- convergence rate
- step size
- stochastic gradient descent
- convergence speed
- learning rate
- markov chain
- global convergence
- mutation operator
- lp norm
- stopping criterion
- optimal control
- primal dual
- dynamical systems
- gaussian kernels
- machine learning
- numerical stability
- approximation algorithms
- optimization algorithm
- evolutionary algorithm
- conjugate gradient