Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy.

Published in: ICLR (Poster) (2019)

Keyphrases