Publication: Bi-level Optimization Method for Automatic Reward Shaping of Reinforcement Learning.