Login / Signup

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs.

Arash AhmadianChris CremerMatthias GalléMarzieh FadaeeJulia KreutzerOlivier PietquinAhmet ÜstünSara Hooker
Published in: CoRR (2024)
Keyphrases