Reducing Entropy Overestimation in Soft Actor Critic Using Dual Policy Network.
Hamid AliHammad MajeedImran UsmanKhalid A. AlmejalliPublished in: Wirel. Commun. Mob. Comput. (2021)
Keyphrases
- actor critic
- policy gradient
- reinforcement learning
- neuro fuzzy
- approximate dynamic programming
- temporal difference
- function approximation
- optimal control
- gradient method
- decision making
- dynamic programming
- reinforcement learning algorithms
- partially observable markov decision processes
- policy iteration
- policy gradient methods