Stability of Stochastic Approximations with 'Controlled Markov' Noise and Temporal Difference Learning.

Arunselvan Ramaswamy Shalabh Bhatnagar

Published in: CoRR (2015)

Keyphrases

temporal difference learning
fixed point
function approximation
game playing
monte carlo
reinforcement learning
approximate value iteration
evaluation function
markov chain
temporal difference
noise level
reinforcement learning algorithms
neural network
markov model
markov decision process
policy iteration