UCB Momentum Q-learning: Correcting the bias without forgetting.
Pierre MénardOmar Darwiche DominguesXuedong ShangMichal ValkoPublished in: ICML (2021)
Keyphrases
- learning rate
- reinforcement learning
- learning algorithm
- cooperative
- function approximation
- convergence rate
- incremental learning
- state space
- multi agent
- stochastic approximation
- multi armed bandit
- convergence speed
- neural network
- model free
- action selection
- reinforcement learning algorithms
- temporal difference learning
- data sets
- bandit problems
- database