Login / Signup
Representation Balancing MDPs for Off-Policy Policy Evaluation.
Yao Liu
Omer Gottesman
Aniruddh Raghu
Matthieu Komorowski
Aldo Faisal
Finale Doshi-Velez
Emma Brunskill
Published in:
CoRR (2018)
Keyphrases
</>
policy evaluation
markov decision processes
least squares
policy iteration
reinforcement learning
temporal difference
monte carlo
model free
function approximation
state space
optimal policy
variance reduction
dynamic programming
probabilistic model
machine learning
evaluation function
average reward