Continuous-Time Time-Varying Policy Iteration.
Qinglai WeiZehua LiaoZhanyu YangBenkai LiDerong LiuPublished in: IEEE Trans. Cybern. (2020)
Keyphrases
- policy iteration
- optimal control
- markov decision processes
- state space
- fixed point
- reinforcement learning
- markov chain
- model free
- finite state
- optimal policy
- least squares
- sample path
- policy evaluation
- infinite horizon
- average reward
- dynamical systems
- dynamic programming
- factored mdps
- temporal difference
- convergence rate
- markov processes
- linear programming
- markov decision process
- neural network
- discounted reward
- markov decision problems
- stochastic processes
- monte carlo
- utility function
- control strategy