Defining admissible rewards for high-confidence policy evaluation in batch reinforcement learning.
Niranjani PrasadBarbara E. EngelhardtFinale Doshi-VelezPublished in: CHIL (2020)
Keyphrases
- high confidence
- policy evaluation
- reinforcement learning
- state space
- temporal difference
- markov decision processes
- model free
- function approximation
- association rules
- policy iteration
- td learning
- reinforcement learning algorithms
- dynamic programming
- optimal policy
- learning algorithm
- action selection
- action space
- learning process
- monte carlo
- markov decision problems
- dynamical systems
- reward function
- supervised learning
- least squares
- prior knowledge
- transfer learning
- markov decision process
- class labels
- function approximators
- small number