Publication: Online learning control based on projected gradient temporal difference and advanced heuristic dynamic programming.