Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment.
Tianhao WuBanghua ZhuRuoyu ZhangZhaojin WenKannan RamchandranJiantao JiaoPublished in: CoRR (2023)
Keyphrases
- pairwise
- optimization algorithm
- optimization problems
- sequence alignment
- global alignment
- multi class
- higher order
- markov random field
- global optimization
- pairwise interactions
- high order
- real time
- asymptotically optimal
- optimization methods
- action selection
- policy making
- optimization method
- image alignment
- similarity function
- optimal policy
- relevance feedback
- data sets