Preference-free Alignment Learning with Regularized Relevance Reward.

Sungdong Kim Minjoon Seo

Published in: CoRR (2024)

Keyphrases

reinforcement learning
online learning
learning algorithm
learning process
neural network
machine learning
active learning
least squares
learning systems
learning tasks
inductive inference
preference learning
information retrieval
multi agent
knowledge acquisition
solving problems