Provably Safe PAC-MDP Exploration Using Analogies.
Melrose RoderickVaishnavh NagarajanJ. Zico KolterPublished in: CoRR (2020)
Keyphrases
- markov decision processes
- exploration strategy
- state space
- reinforcement learning
- markov decision process
- upper bound
- optimal policy
- finite state
- sample size
- semantic relations
- dynamic programming algorithms
- sample complexity
- worst case
- linear program
- learning algorithm
- computational model
- pac learning
- policy iteration
- analogical reasoning
- mistake bound
- dynamic programming
- sample complexity bounds
- computational models
- linear programming
- vc dimension
- action selection
- special case