Login / Signup

Boosting Reward Model with Preference-Conditional Multi-Aspect Synthetic Data Generation.

Jiaming ShenRan XuYennie JunZhen QinTianqi LiuCarl YangYi LiangSimon BaumgartnerMichael Bendersky
Published in: CoRR (2024)
Keyphrases
  • multi aspect
  • probabilistic model
  • multiscale
  • data mining
  • learning algorithm
  • training data
  • reinforcement learning
  • multi agent systems
  • prior knowledge
  • sentiment analysis