Cost-effective Selection of Pretraining Data: A Case Study of Pretraining BERT on Social Media.
Xiang DaiSarvnaz KarimiBen HacheyCécile ParisPublished in: CoRR (2020)
Keyphrases
- cost effective
- data sets
- social media
- big data
- database
- raw data
- low cost
- data acquisition
- input data
- data collection
- data sources
- data analysis
- data points
- data processing
- training data
- data quality
- image data
- long term
- computer systems
- statistical analysis
- missing data
- sensor data
- high quality
- social networks
- social media data