Login / Signup
Doubly Robust Policy Evaluation and Optimization.
Miroslav Dudík
Dumitru Erhan
John Langford
Lihong Li
Published in:
CoRR (2015)
Keyphrases
</>
policy evaluation
least squares
optimization algorithm
reinforcement learning
temporal difference
constrained optimization
learning algorithm
model free