Login / Signup
West-of-N: Synthetic Preference Generation for Improved Reward Modeling.
Alizée Pace
Jonathan Mallinson
Eric Malmi
Sebastian Krause
Aliaksei Severyn
Published in:
CoRR (2024)
Keyphrases
</>
reinforcement learning
real world
three dimensional
long run
modeling method
data sets
databases
image processing
expert systems
real scenes