Policy iteration based Q-learning for linear nonzero-sum quadratic differential games.
Xinxing LiZhihong PengLi LiangWenzhong ZhaPublished in: Sci. China Inf. Sci. (2019)
Keyphrases
- policy iteration
- markov decision processes
- linear approximation
- model free
- reinforcement learning
- linear functions
- optimal policy
- fixed point
- stochastic approximation
- least squares
- sample path
- finite state
- temporal difference
- markov decision process
- average reward
- objective function
- stochastic games
- temporal difference learning
- convergence rate
- infinite horizon
- linear programming
- function approximation
- state space
- discounted reward
- reinforcement learning algorithms
- actor critic
- dynamic programming
- policy evaluation
- optimal control
- approximate policy iteration
- reward function
- machine learning
- markov decision problems
- linear program
- graph cuts
- graphical models