Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning.

Rajeeva L. Karandikar M. Vidyasagar

Published in: CoRR (2021)

Keyphrases

stochastic approximation
reinforcement learning
monte carlo
policy iteration
temporal difference learning
function approximation
neural network
long run
theoretical guarantees