Balancing Value Iteration and Policy Iteration for Discrete-Time Control.
Biao LuoYin YangHuai-Ning WuTingwen HuangPublished in: IEEE Trans. Syst. Man Cybern. Syst. (2020)
Keyphrases
- policy iteration
- markov decision processes
- finite state
- optimal control
- optimal policy
- model free
- reinforcement learning
- least squares
- markov decision process
- sample path
- fixed point
- infinite horizon
- control problems
- temporal difference
- average reward
- policy evaluation
- state space
- dynamic programming
- markov chain
- markov decision problems
- control system
- factored mdps
- average cost
- convergence rate
- linear programming
- control policy
- action selection
- multi agent