Login / Signup
A Convergent Off-Policy Temporal Difference Algorithm.
Raghuram Bharadwaj Diddigi
Chandramouli Kamanchi
Shalabh Bhatnagar
Published in:
ECAI (2020)
Keyphrases
</>
learning algorithm
cost function
temporal difference
monte carlo
td learning
dynamic programming
model free
reinforcement learning
optimization algorithm
pairwise
simulated annealing
support vector machine svm
policy iteration
temporal difference learning