Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction.
Jiachen LiShuo ChengZhenyu LiaoHuayan WangWilliam Yang WangQinxun BaiPublished in: CoRR (2021)
Keyphrases
- reinforcement learning
- action selection
- function approximation
- exploration strategy
- active exploration
- multi agent
- robotic control
- model based reinforcement learning
- state space
- supervised learning
- random variables
- exploration exploitation
- markov decision processes
- gaussian distribution
- autonomous learning
- information visualization
- balancing exploration and exploitation
- visualization tool
- model free
- spatial distribution
- neural network
- data distribution
- graphical models
- probabilistic model
- active learning
- data mining