Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs.

Max Simchowitz Kevin G. Jamieson

Published in: CoRR (2019)

Keyphrases

regret bounds
markov decision processes
reinforcement learning
multi armed bandit
state space
worst case
optimal policy
linear regression
long run
online convex optimization
knn
upper bound
nearest neighbor