Login / Signup
Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy.
Yuan Xie
Boyi Liu
Qiang Liu
Zhaoran Wang
Yuan Zhou
Jian Peng
Published in:
ICLR (Poster) (2019)
Keyphrases
</>
error reduction
active learning
learning algorithm
reinforcement learning
learning process
policy evaluation
feature selection
supervised learning