Policy Iteration Based on Stochastic Factorization.
André da Motta Salles BarretoJoelle PineauDoina PrecupPublished in: J. Artif. Intell. Res. (2014)
Keyphrases
- policy iteration
- stochastic approximation
- sample path
- markov decision processes
- model free
- fixed point
- reinforcement learning
- optimal policy
- least squares
- approximate dynamic programming
- finite state
- markov decision process
- average reward
- infinite horizon
- temporal difference
- asymptotic analysis
- convergence rate
- policy evaluation
- stochastic model
- markov decision problems
- pairwise
- markov chain
- function approximation
- linear programming
- machine learning
- dynamic programming
- optimal control
- monte carlo
- optimal solution
- sufficient conditions