Better Exploration with Optimistic Actor-Critic.
Kamil CiosekQuan VuongRobert LoftinKatja HofmannPublished in: CoRR (2019)
Keyphrases
- actor critic
- reinforcement learning
- policy gradient
- temporal difference
- optimal control
- approximate dynamic programming
- gradient method
- neuro fuzzy
- function approximation
- action selection
- reinforcement learning algorithms
- policy iteration
- multi agent
- evolutionary algorithm
- active learning
- fuzzy logic
- average reward
- machine learning