On Trajectory Augmentations for Off-Policy Evaluation.
Ge GaoQitong GaoXi YangSong JuMiroslav PajicMin ChiPublished in: ICLR (2024)
Keyphrases
- policy evaluation
- least squares
- monte carlo
- temporal difference
- model free
- reinforcement learning
- variance reduction
- matrix inversion
- policy iteration
- function approximation
- markov decision processes
- semi parametric
- optimal policy
- decision making
- reinforcement learning algorithms
- markov chain
- evaluation function
- neural network
- statistical inference
- partially observable markov decision processes
- dynamic programming
- machine learning