A partial policy iteration ADP algorithm for nonlinear neuro-optimal control with discounted total reward.
Mingming LiangQinglai WeiPublished in: Neurocomputing (2021)
Keyphrases
- policy iteration
- optimal control
- average reward
- markov decision processes
- dynamic programming
- total reward
- infinite horizon
- optimal policy
- reinforcement learning
- actor critic
- learning algorithm
- cost function
- optimality criterion
- particle swarm optimization
- linear programming
- linear program
- convergence rate
- state space
- model free
- markov decision process
- markov decision problems
- policy gradient