Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning.
K. LakshmananRonald OrtnerDaniil RyabkoPublished in: ICML (2015)
Keyphrases
- reinforcement learning
- markov decision processes
- regret bounds
- action space
- multi armed bandit
- function approximation
- average reward
- markov decision problems
- policy iteration
- temporal difference
- online learning
- bayesian networks
- similarity measure
- e learning
- optimal policy
- least squares
- multi class
- model free
- upper bound
- data points
- dynamic programming
- lower bound
- decision trees