Universal Off-Policy Evaluation.
Yash ChandakScott NiekumBruno C. da SilvaErik G. Learned-MillerEmma BrunskillPhilip S. ThomasPublished in: NeurIPS (2021)
Keyphrases
- policy evaluation
- least squares
- temporal difference
- monte carlo
- reinforcement learning
- markov decision processes
- model free
- policy iteration
- matrix inversion
- variance reduction
- function approximation
- semi parametric
- optimal policy
- statistical inference
- state space
- evaluation function
- reinforcement learning algorithms
- markov chain