Combining No-regret and Q-learning.
Ian A. KashMichael SullinsKatja HofmannPublished in: AAMAS (2020)
Keyphrases
- reinforcement learning
- learning algorithm
- cooperative
- state space
- lower bound
- online learning
- expert advice
- active learning
- function approximation
- learning rate
- reward function
- reinforcement learning algorithms
- pairwise
- control system
- dynamic programming
- machine learning
- optimal policy
- combining multiple
- action selection