Countering Reward Over-Optimization in LLM with Demonstration-Guided Reinforcement Learning.

Published in: ACL (Findings) (2024)

Keyphrases