Language Model Self-improvement by Reinforcement Learning Contemplation.

Jing-Cheng PangPengyuan WangKaiyuan LiXiong-Hui ChenJiacheng XuZongzhang ZhangYang Yu
Published in: CoRR (2023)
Keyphrases