Login / Signup
Concentration Bounds for Two Timescale Stochastic Approximation with Applications to Reinforcement Learning.
Gal Dalal
Balázs Szörényi
Gugan Thoppe
Shie Mannor
Published in:
CoRR (2017)
Keyphrases
</>
stochastic approximation
reinforcement learning
monte carlo
theoretical guarantees
upper bound
lower bound
temporal difference learning
policy iteration
worst case
neural network
learning process
np hard
markov decision processes
multi agent
optimal policy