Keyphrases
- regret bounds
- worst case
- reinforcement learning
- function approximation
- cooperative
- reward function
- state space
- online learning
- lower bound
- multi agent
- reinforcement learning algorithms
- model free
- confidence bounds
- expert advice
- stochastic approximation
- action selection
- loss function
- temporal difference learning
- learning algorithm
- learning rate
- optimal policy
- online convex optimization
- bandit problems
- upper bound
- linear regression
- regret minimization
- bucket brigade
- binary classification
- reinforcement learning methods
- potential field
- td learning
- single agent
- pairwise
- np hard
- multi agent reinforcement learning
- dynamic programming
- multi armed bandit
- computational complexity
- upper confidence bound