Login / Signup

Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF.

Shicong CenJincheng MeiKatayoon GoshvadiHanjun DaiTong YangSherry YangDale SchuurmansYuejie ChiBo Dai
Published in: CoRR (2024)
Keyphrases