Temporal-difference learning for nonlinear value function approximation in the lazy training regime.

Andrea Agazzi Jianfeng Lu

Published in: CoRR (2019)

Keyphrases

temporal difference learning
fixed point
function approximation
temporal difference
reinforcement learning
evaluation function
game playing
approximate value iteration
monte carlo
reinforcement learning algorithms
markov decision process
training set
supervised learning
bayesian networks