Login / Signup
Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences.
Erdem Biyik
Dylan P. Losey
Malayandi Palan
Nicholas C. Landolfi
Gleb Shevchuk
Dorsa Sadigh
Published in:
CoRR (2020)
Keyphrases
</>
learning algorithm
reinforcement learning
supervised learning
markov decision processes
data mining
prior knowledge
maximum likelihood
utility function