UCB Momentum Q-learning: Correcting the bias without forgetting.

Pierre Ménard Omar Darwiche Domingues Xuedong Shang Michal Valko

Published in: ICML (2021)

Keyphrases

learning rate
reinforcement learning
learning algorithm
cooperative
function approximation
convergence rate
incremental learning
state space
multi agent
stochastic approximation
multi armed bandit
convergence speed
neural network
model free
action selection
reinforcement learning algorithms
temporal difference learning
data sets
bandit problems
database