Publication: A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation.