Two Timescale Stochastic Approximation with Controlled Markov noise.

Prasenjit Karmakar Shalabh Bhatnagar

Published in: CoRR (2015)

Keyphrases

stochastic approximation
monte carlo
markov chain
temporal difference learning
cost function
markov model
policy iteration