Sign in

Beyond Reward: Offline Preference-guided Policy Optimization.

Yachen KangDiyuan ShiJinxin LiuLi HeDonglin Wang
Published in: CoRR (2023)
Keyphrases