Best Policy Identification in discounted MDPs: Problem-specific Sample Complexity.
Aymen Al MarjaniAlexandre ProutièrePublished in: CoRR (2020)
Keyphrases
- sample complexity
- optimal policy
- markov decision processes
- finite horizon
- infinite horizon
- markov decision process
- average reward
- average cost
- pac learning
- learning problems
- theoretical analysis
- active learning
- reinforcement learning
- supervised learning
- upper bound
- lower bound
- dynamic programming
- vc dimension
- markov decision problems
- special case
- policy iteration
- discounted reward
- policy search
- state space
- partially observable
- learning algorithm
- long run
- finite state
- decision problems
- partially observable markov decision processes
- generalization error
- training examples
- state and action spaces
- reward function
- sample size
- sufficient conditions
- stationary policies
- support vector
- data mining
- sample complexity bounds