Self-Explore to Avoid the Pit: Improving the Reasoning Capabilities of Language Models with Fine-grained Rewards.

Published in: CoRR (2024)

Keyphrases