Login / Signup

Off-Policy Evaluation from Logged Human Feedback.

Aniruddha BhargavaLalit JainBranislav KvetonGe LiuSubhojyoti Mukherjee
Published in: CoRR (2024)
Keyphrases
  • policy evaluation
  • least squares
  • temporal difference
  • reinforcement learning
  • model free
  • monte carlo
  • variance reduction
  • matrix inversion
  • computer vision
  • support vector