Login / Signup
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset.
Sneha Kudugunta
Isaac Caswell
Biao Zhang
Xavier Garcia
Christopher A. Choquette-Choo
Katherine Lee
Derrick Xin
Aditya Kusupati
Romi Stella
Ankur Bapna
Orhan Firat
Published in:
CoRR (2023)
Keyphrases
</>
document level
sentence level
sentiment classification
language model
document retrieval
cross lingual
cross language
machine translation
sentiment analysis
language independent
pseudo relevance feedback
data mining
information retrieval
query expansion