Login / Signup
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation.
Haruka Kiyohara
Ren Kishimoto
Kosuke Kawakami
Ken Kobayashi
Kazuhide Nakata
Yuta Saito
Published in:
CoRR (2023)
Keyphrases
</>
policy evaluation
least squares
temporal difference
reinforcement learning
monte carlo
model free
markov decision processes
matrix inversion
policy iteration
variance reduction
function approximation
semi parametric
optimal policy
machine learning
markov chain
semi supervised
computational complexity