Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize.
Mert GürbüzbalabanYuanhan HuUmut SimsekliLingjiong ZhuPublished in: Trans. Mach. Learn. Res. (2023)
Keyphrases
- approximate dynamic programming
- step size
- stochastic gradient descent
- linear program
- cost function
- convergence rate
- reinforcement learning
- dynamic programming
- control policy
- convergence speed
- faster convergence
- average cost
- search direction
- policy iteration
- wavelet coefficients
- optimization algorithm
- np hard
- feature extraction
- quasi newton
- machine learning