Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning.
Peter AuerRonald OrtnerPublished in: NIPS (2006)
Keyphrases
- regret bounds
- reinforcement learning
- online learning
- markov decision processes
- online convex optimization
- multi armed bandit
- linear regression
- policy iteration
- lower bound
- learning algorithm
- model free
- average reward
- upper bound
- learning process
- optimal policy
- function approximation
- optimal control
- infinite horizon
- temporal difference
- stochastic games
- markov decision problems