It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF.

Taiming Lu Lingfeng Shen Xinyu Yang Weiting Tan Beidi Chen Huaxiu Yao

Published in: CoRR (2024)

Keyphrases

probabilistic model
computational model
high level
prior knowledge
theoretical analysis
cost function
probability distribution
management system
least squares
neural network model
decision model
machine learning
formal model
statistical model
mathematical model
supply chain
artificial neural networks