Publication: On a convergent off -policy temporal difference learning algorithm in on-line learning environment.