Login / Signup
Optimal Off-Policy Evaluation from Multiple Logging Policies.
Nathan Kallus
Yuta Saito
Masatoshi Uehara
Published in:
ICML (2021)
Keyphrases
</>
policy evaluation
optimal policy
policy iteration
monte carlo
dynamic programming
optimal control
least squares
markov decision processes
fixed point
average reward