Sign in

Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning.

Zhiheng XiWenxiang ChenBoyang HongSenjie JinRui ZhengWei HeYiwen DingShichun LiuXin GuoJunzhe WangHonglin GuoWei ShenXiaoran FanYuhao ZhouShihan DouXiao WangXinbo ZhangPeng SunTao GuiQi ZhangXuanjing Huang
Published in: CoRR (2024)
Keyphrases