Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path.

Published in: Mach. Learn. (2008)

Keyphrases