Estimating the Reliability of MDP Policies: a Confidence Interval Approach.
Joel R. TetreaultDan BohusDiane J. LitmanPublished in: HLT-NAACL (2007)
Keyphrases
- confidence intervals
- optimal policy
- markov decision process
- failure rate
- markov decision processes
- reward function
- sample size
- partially observable markov decision processes
- markov chain
- markov decision problems
- state space
- stochastic systems
- utility function
- monte carlo
- finite state
- chi square
- infinite horizon
- dynamic programming
- reinforcement learning
- sufficiently small
- data sets
- test set
- learning algorithm
- initial state
- average cost
- policy iteration
- conditional probabilities
- stationary policies
- semi supervised