Mini But Mighty: Efficient Multilingual Pretraining with Linguistically-Informed Data Selection.
Tolúlopé ÒgúnrèmíDan JurafskyChristopher D. ManningPublished in: EACL (Findings) (2023)
Keyphrases
- data sets
- data structure
- data sources
- statistical analysis
- database
- databases
- experimental data
- training data
- high quality
- data analysis
- synthetic data
- data collection
- original data
- raw data
- data distribution
- computer systems
- small number
- relational databases
- spatial data
- application domains
- decision trees
- multimedia
- data acquisition
- missing values
- multimedia data
- feature selection
- genetic algorithm
- historical data