Login / Signup

(N, K)-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model.

Yufeng ZhangLiyu ChenBoyi LiuYingxiang YangQiwen CuiYunzhe TaoHongxia Yang
Published in: CoRR (2024)
Keyphrases