Keyphrases
- reinforcement learning
- reward function
- online learning
- function approximation
- total reward
- multi agent
- lower bound
- reinforcement learning algorithms
- state space
- dynamic programming
- optimal policy
- markov decision processes
- learning algorithm
- robotic control
- optimal control
- multi armed bandit
- expert advice
- model free
- binary classification
- learning problems
- worst case
- action selection
- temporal difference
- data sets
- action space
- function approximators
- online algorithms
- minimax regret
- dynamical systems
- loss function
- confidence bounds