Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling.
Tengyang XieYifei MaYu-Xiang WangPublished in: NeurIPS (2019)
Keyphrases
- policy evaluation
- importance sampling
- monte carlo
- variance reduction
- reinforcement learning
- temporal difference
- least squares
- model free
- markov chain
- dynamic programming
- markov decision processes
- kalman filter
- particle filtering
- function approximation
- policy iteration
- optimal solution
- state space
- pairwise
- worst case
- evaluation function
- closed form
- computer vision
- sample size
- optimal policy
- feature selection