Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters.
Aniruddh RaghuOmer GottesmanYao LiuMatthieu KomorowskiAldo FaisalFinale Doshi-VelezEmma BrunskillPublished in: CoRR (2018)
Keyphrases
- policy evaluation
- semi parametric
- least squares
- policy iteration
- reinforcement learning
- temporal difference
- model free
- monte carlo
- markov decision processes
- function approximation
- optimal policy
- variance reduction
- parameter estimation
- density estimation
- regression model
- statistical inference
- constrained optimization
- semi supervised
- importance sampling
- average reward
- policy gradient
- gaussian process
- partially observable markov decision processes
- action selection
- linear regression
- evaluation function
- markov chain
- multi agent