Planning in Markov Decision Processes with Gap-Dependent Sample Complexity.
Anders JonssonEmilie KaufmannPierre MénardOmar Darwiche DominguesEdouard LeurentMichal ValkoPublished in: NeurIPS (2020)
Keyphrases
- markov decision processes
- sample complexity
- decision theoretic planning
- planning under uncertainty
- partially observable
- learning problems
- theoretical analysis
- finite state
- optimal policy
- state space
- learning algorithm
- reinforcement learning
- upper bound
- supervised learning
- transition matrices
- active learning
- generalization error
- policy iteration
- lower bound
- dynamic programming
- special case
- upper and lower bounds
- training examples
- planning problems
- probabilistic planning
- heuristic search
- markov decision problems
- sample size
- partially observable markov decision processes
- ai planning
- average reward
- markov decision process
- reward function
- machine learning algorithms
- infinite horizon
- cross validation
- objective function
- average cost
- markov chain
- data sets
- real time dynamic programming