Publication: A short variational proof of equivalence between policy gradients and soft Q learning.