Concentration bounds for two time scale stochastic approximation.

Vivek S. Borkar Sarath Pattathil

Published in: Allerton (2018)

Keyphrases

stochastic approximation
monte carlo
upper bound
multi start
theoretical guarantees
lower bound
worst case
reinforcement learning
dynamic programming
model free
policy iteration