Publication: Policy Gradient with Tree Search (PGTS) in Reinforcement Learning Evades Local Maxima.