Login / Signup
Gaussian Approximation and Multiplier Bootstrap for Polyak-Ruppert Averaged Linear Stochastic Approximation with Applications to TD Learning.
Sergey Samsonov
Eric Moulines
Qi-Man Shao
Zhuo-Song Zhang
Alexey Naumov
Published in:
CoRR (2024)
Keyphrases
</>
stochastic approximation
td learning
temporal difference
monte carlo
policy iteration
policy evaluation
temporal difference learning
reinforcement learning
evaluation function
function approximation
function approximators
reinforcement learning algorithms
confidence intervals
model free
theoretical guarantees