Confronting Reward Model Overoptimization with Constrained RLHF.

Published in: CoRR (2023)

Keyphrases