Login / Signup
Off-Policy Evaluation for Action-Dependent Non-stationary Environments.
Yash Chandak
Shiv Shankar
Nathaniel D. Bastian
Bruno C. da Silva
Emma Brunskill
Philip S. Thomas
Published in:
NeurIPS (2022)
Keyphrases
</>
policy evaluation
least squares
temporal difference
monte carlo
model free
reinforcement learning
variance reduction
policy iteration
markov decision processes
function approximation
action selection
optimal policy
semi parametric
statistical inference
markov chain
active learning
learning algorithm