When Do You Need Billions of Words of Pretraining Data?
Yian ZhangAlex WarstadtXiaocheng LiSamuel R. BowmanPublished in: ACL/IJCNLP (1) (2021)
Keyphrases
- data structure
- data sets
- data analysis
- synthetic data
- training data
- high quality
- missing data
- information extraction
- complex data
- data quality
- data distribution
- database
- image data
- knowledge discovery
- input data
- data collection
- probability distribution
- experimental data
- end users
- data acquisition
- raw data
- data sources
- neural network
- historical data
- semantic meaning