Login / Signup
Stabilizing RLHF through Advantage Model and Selective Rehearsal.
Baolin Peng
Linfeng Song
Ye Tian
Lifeng Jin
Haitao Mi
Dong Yu
Published in:
CoRR (2023)
Keyphrases
</>
computational model
probabilistic model
theoretical analysis
management system
probability distribution
agent model
hybrid model
bayesian framework
closed form
parameter estimation
real time
cost function
prior knowledge
objective function
high level
information systems
data sets