Login / Signup
Off-Policy Evaluation from Logged Human Feedback.
Aniruddha Bhargava
Lalit Jain
Branislav Kveton
Ge Liu
Subhojyoti Mukherjee
Published in:
CoRR (2024)
Keyphrases
</>
policy evaluation
least squares
temporal difference
reinforcement learning
model free
monte carlo
variance reduction
matrix inversion
computer vision
support vector