DPO Meets PPO: Reinforced Token Optimization for RLHF.
Han ZhongGuhao FengWei XiongLi ZhaoDi HeJiang BianLiwei WangPublished in: CoRR (2024)
Keyphrases
- optimization algorithm
- optimization process
- global optimization
- optimization problems
- discrete optimization
- optimization procedure
- constrained optimization
- social networks
- decision trees
- search algorithm
- optimization method
- mobile devices
- joint optimization
- combinatorial optimization
- data sets
- similarity measure
- computer vision
- artificial intelligence
- genetic algorithm
- machine learning
- data mining