Sign in

Mitigating Reward Hacking via Information-Theoretic Reward Modeling.

Yuchun MiaoSen ZhangLiang DingRong BaoLefei ZhangDacheng Tao
Published in: CoRR (2024)
Keyphrases