Off-Policy Evaluation with Policy-Dependent Optimization Response.
Wenshuo GuoMichael I. JordanAngela ZhouPublished in: NeurIPS (2022)
Keyphrases
- policy evaluation
- least squares
- temporal difference
- monte carlo
- reinforcement learning
- policy iteration
- variance reduction
- optimal policy
- markov decision processes
- model free
- optimization algorithm
- semi parametric
- function approximation
- constrained optimization
- partially observable markov decision processes
- sample size
- average reward
- regression model
- markov chain