Login / Signup

Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning.

Tianduo WangShichen LiWei Lu
Published in: CoRR (2024)
Keyphrases