Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning.

Published in: Math. Oper. Res. (2018)

Keyphrases