When Do You Need Billions of Words of Pretraining Data?
Yian ZhangAlex WarstadtHaau-Sing LiSamuel R. BowmanPublished in: CoRR (2020)
Keyphrases
- statistical analysis
- data sets
- experimental data
- training data
- small number
- multimedia data
- synthetic data
- data mining techniques
- data points
- data analysis
- data structure
- data sources
- data processing
- data collection
- noisy data
- database
- data distribution
- data objects
- original data
- raw data
- historical data
- input data
- image data
- probabilistic model
- high quality
- web pages