Login / Signup
Deep reinforcement learning from human preferences.
Paul F. Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
Published in:
CoRR (2017)
Keyphrases
</>
reinforcement learning
decision making
human interaction
multi agent
preference relations
function approximation
multi attribute
user preferences
human behavior
robotic control
learning algorithm
temporal difference
optimal control
computational models
human experts
supervised learning
learning process