Safe Policy Iteration.
Matteo PirottaMarcello RestelliAlessio PecorinoDaniele CalandrielloPublished in: ICML (3) (2013)
Keyphrases
- policy iteration
- markov decision processes
- model free
- reinforcement learning
- fixed point
- least squares
- optimal policy
- finite state
- sample path
- temporal difference
- average reward
- infinite horizon
- policy evaluation
- optimal control
- markov decision problems
- linear programming
- convergence rate
- markov decision process
- function approximation
- dynamic programming
- average cost
- machine learning
- discounted reward
- random variables
- markov random field
- state space
- probability distribution