• search
    search
  • reviewers
    reviewers
  • feeds
    feeds
  • assignments
    assignments
  • settings
  • logout

nmT5 - Is parallel data still relevant for pre-training massively multilingual language models?

Mihir KaleAditya SiddhantRami Al-RfouLinting XueNoah ConstantMelvin Johnson
Published in: ACL/IJCNLP (2) (2021)
Keyphrases
  • language model
  • training data
  • retrieval model
  • language modeling
  • feature selection
  • n gram
  • information retrieval
  • decision trees
  • error rate
  • parallel computing
  • statistical language models