Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs.
Xin LaiZhuotao TianYukang ChenSenqiao YangXiangru PengJiaya JiaPublished in: CoRR (2024)
Keyphrases
- step wise
- optimization process
- knowledge base
- optimization problems
- optimization algorithm
- automated reasoning
- knowledge representation
- optimization method
- combinatorial optimization
- optimization model
- reasoning systems
- multi criteria
- artificial intelligence
- rule based reasoning
- cp nets
- model based reasoning
- discrete optimization
- reasoning tasks
- qualitative reasoning
- database
- case based reasoning
- expert systems
- search algorithm