Publication: Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy.