Beyond No Regret: Instance-Dependent PAC Reinforcement Learning.

Andrew Wagenmaker Max Simchowitz Kevin G. Jamieson

Published in: CoRR (2021)

Keyphrases

reinforcement learning
lower bound
total reward
learning algorithm
reward function
reinforcement learning algorithms
sample complexity
state space
worst case
online learning
markov decision processes
statistical queries
function approximation
upper bound
loss function
optimal policy
machine learning
binary classification
temporal difference
support vector
multi agent
expert advice
minimax regret
multi armed bandit
transfer learning
neural network
vc dimension
model free
noise tolerant