Beyond No Regret: Instance-Dependent PAC Reinforcement Learning.
Andrew J. WagenmakerMax SimchowitzKevin JamiesonPublished in: COLT (2022)
Keyphrases
- reinforcement learning
- reward function
- online learning
- lower bound
- state space
- sample complexity
- function approximation
- upper bound
- worst case
- machine learning
- model free
- vc dimension
- supervised learning
- learning problems
- optimal policy
- temporal difference
- temporal difference learning
- expert advice
- reinforcement learning algorithms
- total reward
- learning algorithm
- binary classification
- sample size
- loss function
- pac learning
- statistical queries
- markov decision processes
- multi armed bandit
- confidence bounds