Computing the Bias of Constant-step Stochastic Approximation with Markovian Noise.

Sebastian Allmeier Nicolas Gast

Published in: CoRR (2024)

Keyphrases

stochastic approximation
monte carlo
policy iteration
temporal difference learning
reinforcement learning
least squares