Optimal Actor-Critic Policy With Optimized Training Datasets.
Chayan BanerjeeZhiyong ChenNasimul NomanMohsen ZamaniPublished in: IEEE Trans. Emerg. Top. Comput. Intell. (2022)
Keyphrases
- actor critic
- training dataset
- optimal control
- approximate dynamic programming
- policy gradient
- average reward
- reinforcement learning
- neuro fuzzy
- control policy
- policy gradient methods
- temporal difference
- optimal policy
- dynamic programming
- reinforcement learning algorithms
- training data
- policy iteration
- markov decision processes
- gradient method
- optimal solution
- function approximation
- decision problems
- long run
- reward function
- sufficient conditions
- training set
- learning algorithm