Beyond No Regret: Instance-Dependent PAC Reinforcement Learning.
Andrew WagenmakerMax SimchowitzKevin G. JamiesonPublished in: CoRR (2021)
Keyphrases
- reinforcement learning
- lower bound
- total reward
- learning algorithm
- reward function
- reinforcement learning algorithms
- sample complexity
- state space
- worst case
- online learning
- markov decision processes
- statistical queries
- function approximation
- upper bound
- loss function
- optimal policy
- machine learning
- binary classification
- temporal difference
- support vector
- multi agent
- expert advice
- minimax regret
- multi armed bandit
- transfer learning
- neural network
- vc dimension
- model free
- noise tolerant