Login / Signup
OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators.
Allen Nie
Yash Chandak
Christina J. Yuan
Anirudhan Badrinath
Yannis Flet-Berliac
Emma Brunskill
Published in:
CoRR (2024)
Keyphrases
</>
policy evaluation
least squares
monte carlo
function approximation
model free
temporal difference
variance reduction
machine learning
learning algorithm
data streams
belief revision
policy iteration