Policy-Adaptive Estimator Selection for Off-Policy Evaluation.
Takuma UdagawaHaruka KiyoharaYusuke NaritaYuta SaitoKei TatenoPublished in: CoRR (2022)
Keyphrases
- policy evaluation
- least squares
- variance reduction
- monte carlo
- temporal difference
- model free
- policy iteration
- reinforcement learning
- markov decision processes
- function approximation
- optimal policy
- semi parametric
- linear regression
- partially observable markov decision processes
- importance sampling
- dynamical systems
- decision problems