Beyond No Regret: Instance-Dependent PAC Reinforcement Learning.

Andrew J. Wagenmaker Max Simchowitz Kevin Jamieson

Published in: COLT (2022)

Keyphrases

reinforcement learning
reward function
online learning
lower bound
state space
sample complexity
function approximation
upper bound
worst case
machine learning
model free
vc dimension
supervised learning
learning problems
optimal policy
temporal difference
temporal difference learning
expert advice
reinforcement learning algorithms
total reward
learning algorithm
binary classification
sample size
loss function
pac learning
statistical queries
markov decision processes
multi armed bandit
confidence bounds