Doubly Robust Interval Estimation for Optimal Policy Evaluation in Online Learning.

Hengrui Cai Ye Shen Rui Song

Published in: CoRR (2021)

Keyphrases

online learning
policy evaluation
interval estimation
least squares
dynamic programming
optimal solution
optimal control
e learning
learning algorithm
worst case
lower bound
search space
np hard
neural network
reinforcement learning
monte carlo
function approximation
model free
policy iteration