Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning.
Gang ChenYiming PengPublished in: CoRR (2019)
Keyphrases
- reinforcement learning
- actor critic
- function approximation
- policy gradient
- temporal difference
- optimal control
- reinforcement learning algorithms
- approximate dynamic programming
- mobile robot
- dynamic environments
- markov decision processes
- action selection
- machine learning
- gradient method
- model free
- neural network
- learning agent
- supervised learning
- training set
- policy gradient methods