Login / Signup
Optimal Off-Policy Evaluation from Multiple Logging Policies.
Nathan Kallus
Yuta Saito
Masatoshi Uehara
Published in:
CoRR (2020)
Keyphrases
</>
optimal policy
worst case
policy evaluation
np hard
dynamic programming
td learning
learning algorithm
feature selection
decision making
image sequences
computational complexity
probabilistic model
monte carlo