Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning.
Tianduo WangShichen LiWei LuPublished in: CoRR (2024)
Keyphrases
- optimization algorithm
- optimization problems
- knowledge representation
- global optimization
- artificial intelligence
- optimization methods
- optimization method
- semi supervised learning
- semi supervised
- training set
- multi agent
- knowledge base
- cost sensitive
- data sets
- automated reasoning
- user preferences
- multi attribute
- constrained optimization
- reasoning tasks
- reasoning systems
- consistency checking
- model based reasoning