Publication: Policy Gradient using Weak Derivatives for Reinforcement Learning.