Login / Signup

ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback.

Ju-Seung ByunJiyun ChunJihyung KilAndrew Perrault
Published in: CoRR (2024)
Keyphrases