Login / Signup
A Convergent Off-Policy Temporal Difference Algorithm.
Raghuram Bharadwaj Diddigi
Chandramouli Kamanchi
Shalabh Bhatnagar
Published in:
CoRR (2019)
Keyphrases
</>
dynamic programming
temporal difference
learning algorithm
search space
cost function
monte carlo
objective function
optimization algorithm
td learning
reinforcement learning
artificial neural networks
learning process
linear programming
convergence rate
step size