Login / Signup
Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples.
Tengyu Xu
Shaofeng Zou
Yingbin Liang
Published in:
NeurIPS (2019)
Keyphrases
</>
asymptotic analysis
td learning
temporal difference
evaluation function
fluid model
function approximation
multi step
training set
dynamic programming
monte carlo
action selection
policy evaluation