Login / Signup
Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy.
Yuan Xie
Boyi Liu
Qiang Liu
Zhaoran Wang
Yuan Zhou
Jian Peng
Published in:
CoRR (2018)
Keyphrases
</>
error reduction
reinforcement learning
learning tasks
learning process
supervised learning
policy evaluation
learning algorithm
active learning
least squares
evaluation function
feature selection
training data
training set
markov decision processes
temporal difference