Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning.

Peter Auer Ronald Ortner

Published in: NIPS (2006)

Keyphrases

regret bounds
reinforcement learning
online learning
markov decision processes
online convex optimization
multi armed bandit
linear regression
policy iteration
lower bound
learning algorithm
model free
average reward
upper bound
learning process
optimal policy
function approximation
optimal control
infinite horizon
temporal difference
stochastic games
markov decision problems