Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs.
Arash AhmadianChris CremerMatthias GalléMarzieh FadaeeJulia KreutzerOlivier PietquinAhmet ÜstünSara HookerPublished in: ACL (1) (2024)
Keyphrases
- learning process
- learning systems
- human learning
- prior knowledge
- machine learning
- elementary school
- learning mechanism
- human subjects
- human experts
- learning problems
- learning tasks
- optimization method
- optimization algorithm
- supervised learning
- reinforcement learning
- online learning
- active learning
- artificial intelligence
- data sets
- motor skills