Login / Signup
nmT5 - Is parallel data still relevant for pre-training massively multilingual language models?
Mihir Kale
Aditya Siddhant
Rami Al-Rfou
Linting Xue
Noah Constant
Melvin Johnson
Published in:
ACL/IJCNLP (2) (2021)
Keyphrases
</>
language model
training data
retrieval model
language modeling
feature selection
n gram
information retrieval
decision trees
error rate
parallel computing
statistical language models