Login / Signup
Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning.
Tianduo Wang
Shichen Li
Wei Lu
Published in:
ACL (1) (2024)
Keyphrases
</>
optimization algorithm
knowledge representation
global optimization
optimization problems
knowledge base
semi supervised
reasoning systems
semi supervised learning
optimization method
reasoning process
user preferences
optimization process
optimization model
qualitative reasoning
discrete optimization