Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds.
Yihao FengZiyang TangNa ZhangQiang LiuPublished in: CoRR (2021)
Keyphrases
- confidence intervals
- variance reduction
- policy evaluation
- duality gap
- sample size
- monte carlo
- markov chain
- least squares
- linear programming
- linear program
- worst case
- test set
- support vector
- objective function
- temporal difference
- upper bound
- finite state
- model free
- asymptotically optimal
- optimal solution
- machine learning
- large deviations
- roc curve
- lower bound
- data sets