Publication: Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback.