Login / Signup
Off-Policy Evaluation of Bandit Algorithm from Dependent Samples under Batch Update Policy.
Masahiro Kato
Yusuke Kaneko
Published in:
CoRR (2020)
Keyphrases
</>
learning algorithm
cost function
objective function
search space
policy evaluation
np hard
dynamic programming
monte carlo
machine learning
optimal solution
sufficient conditions
support vector machine svm
optimal policy
constrained optimization