Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

Ronald Ortner Daniil Ryabko

Published in: CoRR (2013)

Keyphrases

reinforcement learning
markov decision processes
online learning
regret bounds
action space
multi armed bandit
function approximation
online convex optimization
state space
average reward
optimal policy
policy iteration
markov decision problems
machine learning
stochastic games
learning process
learning algorithm