Login / Signup

Off-Policy Evaluation for Human Feedback.

Qitong GaoGe GaoJuncheng DongVahid TarokhMin ChiMiroslav Pajic
Published in: CoRR (2023)
Keyphrases
  • policy evaluation
  • monte carlo
  • semi parametric
  • matrix inversion
  • support vector
  • least squares
  • model free
  • variance reduction
  • machine learning
  • optimal policy
  • temporal difference