Sign in

Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback.

Wei ShenRui ZhengWenYu ZhanJun ZhaoShihan DouTao GuiQi ZhangXuanjing Huang
Published in: CoRR (2023)
Keyphrases