iREPO: implicit Reward Pairwise Difference based Empirical Preference Optimization.
Long Tan LeHan ShuTung-Anh NguyenChoong Seon HongNguyen Hoang TranPublished in: CoRR (2024)
Keyphrases
- pairwise
- reinforcement learning
- similarity measure
- optimization algorithm
- multi class
- discrete optimization
- machine learning
- markov random field
- theoretical analysis
- user preferences
- optimization process
- combinatorial optimization
- pairwise interactions
- constrained optimization
- similarity function
- spectral clustering
- global optimization
- optimization method
- high order
- information theoretic
- linear programming
- semi supervised