Login / Signup
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF.
Tengyang Xie
Dylan J. Foster
Akshay Krishnamurthy
Corby Rosset
Ahmed Awadallah
Alexander Rakhlin
Published in:
CoRR (2024)
Keyphrases
</>
global optimization
optimization process
efficient computation
error bounds
approximation error
information systems
constrained optimization
open ended