Login / Signup
Finite sample analysis of the GTD Policy Evaluation Algorithms in Markov Setting.
Yue Wang
Wei Chen
Yuting Liu
Zhiming Ma
Tie-Yan Liu
Published in:
NIPS (2017)
Keyphrases
</>
policy evaluation
finite sample
least squares
worst case
neural network
learning algorithm
upper bound
error bounds
convergence rate
temporal difference
policy iteration