Sign in

Policy Optimization in RLHF: The Impact of Out-of-preference Data.

Ziniu LiTian XuYang Yu
Published in: CoRR (2023)
Keyphrases