Minimax Confidence Interval for Off-Policy Evaluation and Policy Optimization.
Nan JiangJiawei HuangPublished in: CoRR (2020)
Keyphrases
- policy evaluation
- confidence intervals
- variance reduction
- monte carlo
- least squares
- temporal difference
- reinforcement learning
- markov decision processes
- policy iteration
- model free
- sample size
- function approximation
- optimal policy
- markov chain
- evaluation function
- semi parametric
- data sets
- constrained optimization
- high dimensional data
- statistical inference