Publication: Reward-Punishment Reinforcement Learning with Maximum Entropy.