Login / Signup
Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds.
Yihao Feng
Ziyang Tang
Na Zhang
Qiang Liu
Published in:
ICLR (2021)
Keyphrases
</>
confidence intervals
variance reduction
policy evaluation
duality gap
monte carlo
sample size
markov chain
least squares
linear programming
test set
linear program
conditional probabilities
roc curve
temporal difference
upper bound
data sets
markov decision processes
support vector
objective function