Off-policy Evaluation in Doubly Inhomogeneous Environments.
Zeyu BianChengchun ShiZhengling QiLan WangPublished in: CoRR (2023)
Keyphrases
- policy evaluation
- least squares
- temporal difference
- monte carlo
- reinforcement learning
- markov decision processes
- model free
- policy iteration
- variance reduction
- matrix inversion
- semi parametric
- function approximation
- dynamic environments
- training data
- partially observable markov decision processes
- fixed point
- regression model
- objective function
- image sequences