Login / Signup
Policy Optimization in RLHF: The Impact of Out-of-preference Data.
Ziniu Li
Tian Xu
Yang Yu
Published in:
CoRR (2023)
Keyphrases
</>
data collection
data sets
data structure
small number
statistical analysis
original data
prior knowledge
data points
learning algorithm
social networks
training data
image data
data processing
missing data
data acquisition