Defining Admissible Rewards for High Confidence Policy Evaluation.
Niranjani PrasadBarbara E. EngelhardtFinale Doshi-VelezPublished in: CoRR (2019)
Keyphrases
- high confidence
- policy evaluation
- reinforcement learning
- markov decision processes
- state space
- policy iteration
- temporal difference
- least squares
- model free
- monte carlo
- association rules
- optimal policy
- function approximation
- variance reduction
- finite state
- markov chain
- reinforcement learning algorithms
- partially observable
- machine learning
- semi parametric
- learning algorithm
- dynamic programming
- reward function
- action selection
- fixed point
- average cost
- markov decision process
- class labels
- average reward
- data sets
- planning problems