Login / Signup
Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback.
Wei Shen
Rui Zheng
WenYu Zhan
Jun Zhao
Shihan Dou
Tao Gui
Qi Zhang
Xuanjing Huang
Published in:
CoRR (2023)
Keyphrases
</>
reinforcement learning
human operators
function approximation
human behavior
human faces
data transmission
multi agent
total length
motor skills
human subjects
relevance feedback
human experts
model free
reinforcement learning algorithms
state space
multi agent systems
user engagement