Login / Signup
Representation Balancing MDPs for Off-policy Policy Evaluation.
Yao Liu
Omer Gottesman
Aniruddh Raghu
Matthieu Komorowski
Aldo A. Faisal
Finale Doshi-Velez
Emma Brunskill
Published in:
NeurIPS (2018)
Keyphrases
</>
policy evaluation
markov decision processes
least squares
reinforcement learning
policy iteration
monte carlo
temporal difference
state space
optimal policy
function approximation
model free
variance reduction
decision making
semi supervised
markov chain
markov decision problems