Login / Signup

Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback.

Chenliang LiSiliang ZengZeyi LiaoJiaxiang LiDongyeop KangAlfredo GarcíaMingyi Hong
Published in: CoRR (2024)
Keyphrases
  • preference learning
  • ordinal regression
  • gaussian processes
  • pairwise comparison
  • recommender systems
  • closed form
  • similarity measure
  • training data