Sign in

Entropy-Regularized Token-Level Policy Optimization for Large Language Models.

Muning WenCheng DengJun WangWeinan ZhangYing Wen
Published in: CoRR (2024)
Keyphrases