Universal Off-Policy Evaluation.
Yash ChandakScott NiekumBruno Castro da SilvaErik G. Learned-MillerEmma BrunskillPhilip S. ThomasPublished in: CoRR (2021)
Keyphrases
- policy evaluation
- least squares
- monte carlo
- temporal difference
- reinforcement learning
- model free
- matrix inversion
- markov decision processes
- variance reduction
- policy iteration
- semi parametric
- function approximation
- optimal policy
- evaluation function
- optical flow
- statistical inference
- sample size
- reinforcement learning algorithms
- partially observable markov decision processes
- multi agent
- machine learning