Modified value-function-approximation for synchronous policy iteration with single-critic configuration for nonlinear optimal control.
Difan TangLei ChenZhao Feng TianEric HuPublished in: Int. J. Control (2021)
Keyphrases
- optimal control
- policy iteration
- actor critic
- temporal difference
- infinite horizon
- control problems
- approximate dynamic programming
- reinforcement learning
- markov decision processes
- linear quadratic
- dynamic programming
- temporal difference learning
- policy gradient
- control strategy
- continuous stirred tank reactor
- model free
- optimal policy
- policy evaluation
- function approximation
- state space
- fixed point
- control law
- markov decision problems
- policy iteration algorithm
- least squares
- average reward
- reinforcement learning algorithms
- evaluation function
- markov decision process
- finite state
- markov games
- average cost
- reward function