MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies.

Published in: CoRR (2023)

Keyphrases