UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining.
Hyung Won ChungNoah ConstantXavier GarciaAdam RobertsYi TaySharan NarangOrhan FiratPublished in: CoRR (2023)
Keyphrases
- databases
- language specific
- monte carlo
- programming language
- language independent
- language learning
- data sets
- real life
- machine learning
- digital libraries
- probabilistic model
- bayesian networks
- description logics
- n gram
- case study
- artificial intelligence
- small scale
- random sampling
- information retrieval
- multilingual documents