Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs.

Published in: ACL (1) (2024)

Keyphrases