Login / Signup
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback.
Nathan O. Lambert
Roberto Calandra
Published in:
CoRR (2023)
Keyphrases
</>
reinforcement learning
mobile robot
state space
human behavior
model free
human interaction
learning algorithm
multi agent
user engagement
dynamic time warping
behavioural cloning
genetic algorithm
human operators
multiple objectives
function approximation
human subjects
learning process