A nearly Blackwell-optimal policy gradient method.
Vektor DewantoMarcus GallagherPublished in: CoRR (2021)
Keyphrases
- optimal policy
- gradient method
- convergence rate
- markov decision processes
- decision problems
- finite horizon
- dynamic programming
- multistage
- long run
- infinite horizon
- state space
- reinforcement learning
- step size
- finite state
- state dependent
- bayesian reinforcement learning
- negative matrix factorization
- sufficient conditions
- policy iteration
- average cost
- markov decision process
- machine learning
- average reward
- actor critic
- partially observable markov decision processes
- optimization methods
- reward function
- policy gradient
- learning algorithm