Improving Exploration in Actor-Critic With Weakly Pessimistic Value Estimation and Optimistic Policy Optimization.
Fan LiMingsheng FuWenyu ChenFan ZhangHaixian ZhangHong QuZhang YiPublished in: IEEE Trans. Neural Networks Learn. Syst. (2024)
Keyphrases
- actor critic
- policy gradient
- reinforcement learning
- approximate dynamic programming
- optimal control
- temporal difference
- gradient method
- neuro fuzzy
- reinforcement learning algorithms
- average reward
- policy gradient methods
- action selection
- optimization algorithm
- optimization problems
- evaluation function
- optimization method
- natural actor critic
- step size
- optimization methods
- policy iteration
- optimal policy
- markov decision process
- markov decision processes
- learning algorithm