Publication: Q-learning and enhanced policy iteration in discounted dynamic programming.