Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples with On-Policy Experience.
Chayan BanerjeeZhiyong ChenNasimul NomanPublished in: CoRR (2021)
Keyphrases
- actor critic
- policy gradient
- reinforcement learning
- optimal control
- approximate dynamic programming
- policy gradient methods
- neuro fuzzy
- temporal difference
- gradient method
- natural actor critic
- reinforcement learning algorithms
- policy iteration
- average reward
- search space
- function approximation
- optimal policy
- learning algorithm
- training set
- linear program
- multi agent systems
- dynamic programming