Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding.
Hongseok NamkoongRamtin KeramatiSteve YadlowskyEmma BrunskillPublished in: CoRR (2020)
Keyphrases
- policy evaluation
- least squares
- monte carlo
- temporal difference
- reinforcement learning
- markov decision processes
- model free
- matrix inversion
- policy iteration
- decision making
- decision makers
- function approximation
- variance reduction
- latent variables
- semi parametric
- upper bound
- evaluation function
- decision process
- hidden variables
- linear programming