Login / Signup
Data-Efficient Policy Evaluation Through Behavior Policy Search.
Josiah P. Hanna
Philip S. Thomas
Peter Stone
Scott Niekum
Published in:
ICML (2017)
Keyphrases
</>
training data
probability distribution
dynamic programming