Login / Signup

OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators.

Allen NieYash ChandakChristina J. YuanAnirudhan BadrinathYannis Flet-BerliacEmma Brunskill
Published in: CoRR (2024)
Keyphrases
  • policy evaluation
  • least squares
  • monte carlo
  • function approximation
  • model free
  • temporal difference
  • variance reduction
  • machine learning
  • learning algorithm
  • data streams
  • belief revision
  • policy iteration