Sign in

MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies.

Shiyue ZhangShijie WuOzan IrsoySteven LuMohit BansalMark DredzeDavid S. Rosenberg
Published in: CoRR (2023)
Keyphrases