Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs.

Published in: CoRR (2024)

Keyphrases