Fast two-time-scale stochastic gradient method with applications in reinforcement learning.
Sihan ZengThinh T. DoanPublished in: COLT (2024)
Keyphrases
- gradient method
- reinforcement learning
- actor critic
- policy gradient
- convergence rate
- step size
- convex formulation
- optimization methods
- negative matrix factorization
- function approximation
- reinforcement learning algorithms
- machine learning
- temporal difference
- model free
- markov decision processes
- optimal policy
- text categorization
- multi objective
- feature space
- keywords
- training data