Deep reinforcement learning from human preferences.

Paul F. Christiano Jan Leike Tom B. Brown Miljan Martic Shane Legg Dario Amodei

Published in: CoRR (2017)

Keyphrases

reinforcement learning
decision making
human interaction
multi agent
preference relations
function approximation
multi attribute
user preferences
human behavior
robotic control
learning algorithm
temporal difference
optimal control
computational models
human experts
supervised learning
learning process