Publication: Cold-Start Reinforcement Learning with Softmax Policy Gradient.