Login / Signup
Doubly Robust Interval Estimation for Optimal Policy Evaluation in Online Learning.
Hengrui Cai
Ye Shen
Rui Song
Published in:
CoRR (2021)
Keyphrases
</>
online learning
policy evaluation
interval estimation
least squares
dynamic programming
optimal solution
optimal control
e learning
learning algorithm
worst case
lower bound
search space
np hard
neural network
reinforcement learning
monte carlo
function approximation
model free
policy iteration