Login / Signup
Online Regret Bounds for Undiscounted Continuous Reinforcement Learning
Ronald Ortner
Daniil Ryabko
Published in:
CoRR (2013)
Keyphrases
</>
reinforcement learning
markov decision processes
online learning
regret bounds
action space
multi armed bandit
function approximation
online convex optimization
state space
average reward
optimal policy
policy iteration
markov decision problems
machine learning
stochastic games
learning process
learning algorithm