Login / Signup
Consistent On-Line Off-Policy Evaluation.
Assaf Hallak
Shie Mannor
Published in:
CoRR (2017)
Keyphrases
</>
policy evaluation
least squares
temporal difference
policy iteration
monte carlo
model free
reinforcement learning
markov decision processes
matrix inversion
variance reduction
function approximation
semi parametric
step size
optimal policy
evaluation function