Policy Gradient for Continuing Tasks in Discounted Markov Decision Processes.
Santiago PaternainJuan Andrés BazerqueAlejandro RibeiroPublished in: IEEE Trans. Autom. Control. (2022)
Keyphrases
- markov decision processes
- average reward
- policy gradient
- reinforcement learning algorithms
- reinforcement learning
- state space
- optimal policy
- finite state
- policy iteration
- dynamic programming
- average cost
- actor critic
- finite horizon
- partially observable
- infinite horizon
- partially observable markov decision processes
- long run
- discounted reward
- markov decision process
- reward function
- action space
- stochastic games
- reinforcement learning methods
- state action
- optimal control
- transfer learning