Sign in

West-of-N: Synthetic Preference Generation for Improved Reward Modeling.

Alizée PaceJonathan MallinsonEric MalmiSebastian KrauseAliaksei Severyn
Published in: CoRR (2024)
Keyphrases
  • reinforcement learning
  • real world
  • three dimensional
  • long run
  • modeling method
  • data sets
  • databases
  • image processing
  • expert systems
  • real scenes