Login / Signup
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs.
Arash Ahmadian
Chris Cremer
Matthias Gallé
Marzieh Fadaee
Julia Kreutzer
Olivier Pietquin
Ahmet Üstün
Sara Hooker
Published in:
CoRR (2024)
Keyphrases
</>
learning process
learning algorithm
reinforcement learning
learning systems
active learning
language acquisition
online learning
decision trees
prior knowledge
knowledge acquisition
human subjects
learning tasks
learning problems
human experts
tutorial dialogue
learning experience
optimization problems
website