Guided Policy Exploration for Markov Decision Processes Using an Uncertainty-Based Value-of-Information Criterion.
Isaac J. SledgeMatthew S. EmighJosé C. PríncipePublished in: IEEE Trans. Neural Networks Learn. Syst. (2018)
Keyphrases
- markov decision processes
- information criterion
- optimal policy
- policy iteration
- model based reinforcement learning
- markov decision process
- model selection
- cross validation
- average reward
- action space
- infinite horizon
- partially observable
- probability model
- state space
- finite state
- state and action spaces
- average cost
- finite horizon
- interval estimation
- reward function
- decision problems
- reinforcement learning
- bayesian information criterion
- discounted reward
- decision theoretic planning
- total reward
- dynamic programming
- markov decision problems
- action selection
- expected reward
- transition matrices
- reinforcement learning algorithms
- stationary policies
- widely applicable
- partially observable markov decision processes
- long run
- sample size
- training set
- selection criterion
- sufficient conditions
- monte carlo
- learning theory
- decision theory
- feature selection