LIRE: listwise reward enhancement for preference alignment.
Mingye ZhuYi LiuLei ZhangJunbo GuoZhendong MaoPublished in: ACL (Findings) (2024)
Keyphrases
- learning to rank
- pairwise
- loss function
- reinforcement learning
- ranking algorithm
- balancing exploration and exploitation
- ranking functions
- multiple imputation
- evaluation measures
- query dependent
- learning to rank algorithms
- web search
- information retrieval
- user preferences
- missing data
- statistical databases
- image processing
- semi supervised
- multi attribute
- multi class
- document retrieval
- missing values
- supervised learning
- incomplete data
- multi criteria
- relevance judgments
- ranking svm
- decision trees
- machine learning