Login / Signup
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback.
Hamish Ivison
Yizhong Wang
Jiacheng Liu
Zeqiu Wu
Valentina Pyatkin
Nathan Lambert
Noah A. Smith
Yejin Choi
Hannaneh Hajishirzi
Published in:
CoRR (2024)
Keyphrases
</>
learning process
learning systems
learning algorithm
supervised learning
knowledge acquisition
neural network
artificial intelligence
learning scheme
inductive inference
case study
reinforcement learning
probabilistic model
online learning
mobile learning
learning tasks