Policy Gradient for Continuing Tasks in Non-stationary Markov Decision Processes.
Santiago PaternainJuan Andrés BazerqueAlejandro RibeiroPublished in: CoRR (2020)
Keyphrases
- non stationary
- markov decision processes
- policy gradient
- average reward
- reinforcement learning algorithms
- reinforcement learning
- optimal policy
- finite horizon
- finite state
- policy iteration
- state space
- dynamic programming
- actor critic
- partially observable markov decision processes
- random fields
- stochastic games
- partially observable
- average cost
- infinite horizon
- reward function
- model free
- markov decision process
- state action
- supervised learning
- computational complexity