Publication: Policy Optimization with Smooth Guidance Rewards Learned from Sparse-Reward Demonstrations.