Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning.
Mingqi YuanBo LiXin JinWenjun ZengPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- active exploration
- exploration strategy
- action selection
- model based reinforcement learning
- exploration exploitation
- function approximation
- exploration exploitation tradeoff
- autonomous learning
- model free
- temporal difference
- markov decision processes
- reinforcement learning algorithms
- active learning
- supervised learning
- optimal control
- information visualization
- learning algorithm
- learning process
- state space
- multi agent reinforcement learning
- machine learning
- artificial neural networks
- policy search
- stochastic approximation
- temporal difference learning
- action space
- data sets