Login / Signup
Boosting Reward Model with Preference-Conditional Multi-Aspect Synthetic Data Generation.
Jiaming Shen
Ran Xu
Yennie Jun
Zhen Qin
Tianqi Liu
Carl Yang
Yi Liang
Simon Baumgartner
Michael Bendersky
Published in:
CoRR (2024)
Keyphrases
</>
multi aspect
probabilistic model
multiscale
data mining
learning algorithm
training data
reinforcement learning
multi agent systems
prior knowledge
sentiment analysis