Login / Signup
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF.
Taiming Lu
Lingfeng Shen
Xinyu Yang
Weiting Tan
Beidi Chen
Huaxiu Yao
Published in:
CoRR (2024)
Keyphrases
</>
probabilistic model
computational model
high level
prior knowledge
theoretical analysis
cost function
probability distribution
management system
least squares
neural network model
decision model
machine learning
formal model
statistical model
mathematical model
supply chain
artificial neural networks