Login / Signup

nmT5 - Is parallel data still relevant for pre-training massively multilingual language models?

Mihir KaleAditya SiddhantRami Al-RfouLinting XueNoah ConstantMelvin Johnson
Published in: ACL/IJCNLP (2) (2021)
Keyphrases
  • language model
  • training data
  • retrieval model
  • language modeling
  • feature selection
  • n gram
  • information retrieval
  • decision trees
  • error rate
  • parallel computing
  • statistical language models