Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs.

Max Simchowitz Kevin G. Jamieson

Published in: NeurIPS (2019)

Keyphrases

regret bounds
markov decision processes
reinforcement learning
state space
multi armed bandit
dynamic programming
online learning
optimal policy
linear regression
knn
lower bound
probability distribution
worst case