Guided Policy Exploration for Markov Decision Processes using an Uncertainty-Based Value-of-Information Criterion.
Isaac J. SledgeMatthew S. EmighJosé C. PríncipePublished in: CoRR (2018)
Keyphrases
- markov decision processes
- information criterion
- optimal policy
- policy iteration
- model based reinforcement learning
- markov decision process
- model selection
- finite horizon
- average reward
- infinite horizon
- finite state
- cross validation
- probability model
- state and action spaces
- action space
- partially observable
- average cost
- reinforcement learning
- interval estimation
- reward function
- bayesian information criterion
- state space
- decision problems
- action selection
- dynamic programming
- factor analysis
- total reward
- transition matrices
- partially observable markov decision processes
- discounted reward
- long run
- sufficient conditions
- markov decision problems
- stationary policies
- reinforcement learning algorithms
- expected reward
- decision theoretic planning
- probability distribution
- markov chain
- sample size
- conditional probabilities
- selection criterion
- machine learning