Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning.

Mingqi Yuan Bo Li Xin Jin Wenjun Zeng

Published in: CoRR (2022)

Keyphrases

reinforcement learning
active exploration
exploration strategy
action selection
model based reinforcement learning
exploration exploitation
function approximation
exploration exploitation tradeoff
autonomous learning
model free
temporal difference
markov decision processes
reinforcement learning algorithms
active learning
supervised learning
optimal control
information visualization
learning algorithm
learning process
state space
multi agent reinforcement learning
machine learning
artificial neural networks
policy search
stochastic approximation
temporal difference learning
action space
data sets