Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective.
Dylan J. FosterAlexander RakhlinDavid Simchi-LeviYunzong XuPublished in: CoRR (2020)
Keyphrases
- reinforcement learning
- optimal policy
- decision problems
- computational complexity
- contextual information
- markov decision processes
- function approximation
- complexity analysis
- neural network
- machine learning
- hidden markov models
- state space
- least squares
- reinforcement learning algorithms
- function approximators
- multi armed bandit