Login / Signup

Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning.

Mathieu RitaFlorian StrubRahma ChaabouniPaul MichelEmmanuel DupouxOlivier Pietquin
Published in: CoRR (2024)
Keyphrases