Login / Signup
Minimax Weight and Q-Function Learning for Off-Policy Evaluation.
Masatoshi Uehara
Jiawei Huang
Nan Jiang
Published in:
ICML (2020)
Keyphrases
</>
learning algorithm
reinforcement learning
supervised learning
td learning
artificial neural networks
active learning
least squares
utility function
learning tasks
evaluation function
temporal difference