UCB Momentum Q-learning: Correcting the bias without forgetting.

Pierre Ménard Omar Darwiche Domingues Xuedong Shang Michal Valko

Published in: CoRR (2021)

Keyphrases

learning rate
reinforcement learning
cooperative
learning algorithm
incremental learning
function approximation
multi agent
stochastic approximation
state space
database
model free
data sets
optimal policy
action selection
multi agent reinforcement learning
bucket brigade
monte carlo
reinforcement learning algorithms
variance reduction