Login / Signup
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF.
Shicong Cen
Jincheng Mei
Katayoon Goshvadi
Hanjun Dai
Tong Yang
Sherry Yang
Dale Schuurmans
Yuejie Chi
Bo Dai
Published in:
CoRR (2024)
Keyphrases
</>
online learning
real time
global optimization
optimization problems
optimization algorithm
information systems
artificial intelligence
constrained optimization
optimal design
cross cultural
collaborative filtering
user preferences
soft constraints
joint optimization
online environment