Temporal Difference-Based Policy Iteration for Optimal Control of Stochastic Systems.
Kang ChengShumin FeiKanjian ZhangXiaomei LiuHaikun WeiPublished in: J. Optim. Theory Appl. (2014)
Keyphrases
- policy iteration
- stochastic systems
- optimal control
- sample path
- temporal difference
- reinforcement learning
- policy evaluation
- model free
- control problems
- function approximation
- dynamic programming
- infinite horizon
- evaluation function
- average reward
- actor critic
- markov decision processes
- action selection
- monte carlo
- reinforcement learning algorithms
- step size
- function approximators
- optimal policy
- state space
- control strategy
- markov decision problems
- machine learning
- learning algorithm
- average cost
- confidence intervals
- supervised learning
- decision making