UCB Momentum Q-learning: Correcting the bias without forgetting.
Pierre MénardOmar Darwiche DominguesXuedong ShangMichal ValkoPublished in: CoRR (2021)
Keyphrases
- learning rate
- reinforcement learning
- cooperative
- learning algorithm
- incremental learning
- function approximation
- multi agent
- stochastic approximation
- state space
- database
- model free
- data sets
- optimal policy
- action selection
- multi agent reinforcement learning
- bucket brigade
- monte carlo
- reinforcement learning algorithms
- variance reduction