Discrete-Time Nonzero-Sum Games for Multiplayer Using Policy-Iteration-Based Adaptive Dynamic Programming Algorithms.
Huaguang ZhangHe JiangChaomin LuoGeyang XiaoPublished in: IEEE Trans. Cybern. (2017)
Keyphrases
- dynamic programming algorithms
- policy iteration
- optimal policy
- markov decision processes
- markov decision problems
- finite state
- dynamic programming
- reinforcement learning
- average reward
- model free
- fixed point
- decision problems
- infinite horizon
- long run
- markov chain
- initial state
- markov decision process
- least squares
- state space
- multistage
- function approximation
- linear programming
- objective function
- optimal control
- sufficient conditions
- decision makers
- optimal solution