Login / Signup
Off-Policy Evaluation for Human Feedback.
Qitong Gao
Ge Gao
Juncheng Dong
Vahid Tarokh
Min Chi
Miroslav Pajic
Published in:
NeurIPS (2023)
Keyphrases
</>
policy evaluation
least squares
temporal difference
model free
monte carlo
matrix inversion
computer vision
reinforcement learning
policy iteration
neural network
variance reduction
cost function
graphical models
markov decision processes
bayesian inference