Publication: Equivalence Between Policy Gradients and Soft Q-Learning.