Login / Signup
Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs.
Max Simchowitz
Kevin G. Jamieson
Published in:
CoRR (2019)
Keyphrases
</>
regret bounds
markov decision processes
reinforcement learning
multi armed bandit
state space
worst case
optimal policy
linear regression
long run
online convex optimization
knn
upper bound
nearest neighbor