Login / Signup
Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs.
Max Simchowitz
Kevin G. Jamieson
Published in:
NeurIPS (2019)
Keyphrases
</>
regret bounds
markov decision processes
reinforcement learning
state space
multi armed bandit
dynamic programming
online learning
optimal policy
linear regression
knn
lower bound
probability distribution
worst case