Login / Signup

Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning.

Prasenjit KarmakarShalabh Bhatnagar
Published in: Math. Oper. Res. (2018)
Keyphrases