V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control.
H. Francis SongAbbas AbdolmalekiJost Tobias SpringenbergAidan ClarkHubert SoyerJack W. RaeSeb NouryArun AhujaSiqi LiuDhruva TirumalaNicolas HeessDan BelovMartin A. RiedmillerMatthew M. BotvinickPublished in: ICLR (2020)