Near-optimal Regret Bounds for Reinforcement Learning.
Thomas JakschRonald OrtnerPeter AuerPublished in: J. Mach. Learn. Res. (2010)
Keyphrases
- reinforcement learning
- regret bounds
- multi armed bandit
- state space
- linear regression
- online learning
- temporal difference
- supervised learning
- learning algorithm
- model free
- lower bound
- optimal policy
- markov decision processes
- learning process
- least squares
- pairwise
- information theoretic
- support vector
- similarity measure
- image sequences
- machine learning