Publication: Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization.