Login / Signup
Off-Policy Evaluation for Human Feedback.
Qitong Gao
Ge Gao
Juncheng Dong
Vahid Tarokh
Min Chi
Miroslav Pajic
Published in:
CoRR (2023)
Keyphrases
</>
policy evaluation
monte carlo
semi parametric
matrix inversion
support vector
least squares
model free
variance reduction
machine learning
optimal policy
temporal difference