Login / Signup
Control Variates for Slate Off-Policy Evaluation.
Nikos Vlassis
Ashok Chandrashekar
Fernando Amat Gil
Nathan Kallus
Published in:
CoRR (2021)
Keyphrases
</>
policy evaluation
least squares
control system
temporal difference
reinforcement learning
model free
matrix inversion
optimal control
variance reduction
state space
regression model
monte carlo
policy iteration