Mechanizing Soundness of Off-Policy Evaluation.
Jared YeagerJ. Eliot B. MossMichael NorrishPhilip S. ThomasPublished in: ITP (2022)
Keyphrases
- policy evaluation
- least squares
- monte carlo
- temporal difference
- reinforcement learning
- markov decision processes
- policy iteration
- variance reduction
- model free
- matrix inversion
- function approximation
- semi parametric
- optimal policy
- dynamic programming
- gaussian process
- statistical inference
- evaluation function
- importance sampling
- finite state
- markov decision problems