Login / Signup
Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback.
Chenliang Li
Siliang Zeng
Zeyi Liao
Jiaxiang Li
Dongyeop Kang
Alfredo García
Mingyi Hong
Published in:
CoRR (2024)
Keyphrases
</>
preference learning
ordinal regression
gaussian processes
pairwise comparison
recommender systems
closed form
similarity measure
training data