Login / Signup
Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning
Odalric-Ambrym Maillard
Phuong Nguyen
Ronald Ortner
Daniil Ryabko
Published in:
CoRR (2013)
Keyphrases
</>
reinforcement learning
regret bounds
state space
dynamic programming
multi armed bandit
markov decision processes
learning algorithm
learning process
online learning
optimal solution
support vector