Sign in

MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies.

Shiyue ZhangShijie WuOzan IrsoySteven LuMohit BansalMark DredzeDavid S. Rosenberg
Published in: ACL (1) (2023)
Keyphrases