Login / Signup
Off-policy evaluation for MDPs with unknown structure.
Assaf Hallak
François Schnitzler
Timothy A. Mann
Shie Mannor
Published in:
CoRR (2015)
Keyphrases
</>
policy evaluation
markov decision processes
reinforcement learning
least squares
policy iteration
temporal difference
model free
function approximation
monte carlo
objective function
multi agent
statistical inference