Login / Signup

Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF.

Tengyang XieDylan J. FosterAkshay KrishnamurthyCorby RossetAhmed AwadallahAlexander Rakhlin
Published in: CoRR (2024)
Keyphrases
  • global optimization
  • optimization process
  • efficient computation
  • error bounds
  • approximation error
  • information systems
  • constrained optimization
  • open ended