Login / Signup
Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples.
Tengyu Xu
Shaofeng Zou
Yingbin Liang
Published in:
CoRR (2019)
Keyphrases
</>
asymptotic analysis
td learning
temporal difference
evaluation function
function approximation
fluid model
reinforcement learning
multi step
reinforcement learning algorithms
training set
model free
decision making
supervised learning
graphical models
step size