Login / Signup
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset.
Sneha Kudugunta
Isaac Caswell
Biao Zhang
Xavier Garcia
Derrick Xin
Aditya Kusupati
Romi Stella
Ankur Bapna
Orhan Firat
Published in:
NeurIPS (2023)
Keyphrases
</>
document level
sentiment classification
sentence level
language model
query expansion
cross lingual
digital libraries
data mining
information retrieval
language independent
coreference resolution