Policy iteration for robust nonstationary Markov decision processes.
Saumya SinhaArchis GhatePublished in: Optim. Lett. (2016)
Keyphrases
- markov decision processes
- non stationary
- policy iteration
- finite horizon
- optimal policy
- state space
- reinforcement learning
- finite state
- policy evaluation
- sample path
- model free
- average reward
- markov decision process
- dynamic programming
- fixed point
- infinite horizon
- factored mdps
- decision processes
- transition matrices
- reinforcement learning algorithms
- markov decision problems
- action space
- random fields
- partially observable
- least squares
- average cost
- discounted reward
- objective function
- temporal difference
- multistage
- markov games