Data, Data Everywhere: A Guide for Pretraining Dataset Construction.
Jupinder ParmarShrimai PrabhumoyeJoseph JenningsBo LiAastha JhunjhunwalaZhilin WangMostofa PatwaryMohammad ShoeybiBryan CatanzaroPublished in: CoRR (2024)
Keyphrases
- data sets
- database
- data processing
- complex data
- temporal information
- data distribution
- experimental data
- application domains
- statistical analysis
- information systems
- data analysis
- data sources
- genetic algorithm
- noisy data
- data acquisition
- case study
- high quality
- missing data
- synthetic data
- data collection
- input data
- image data
- spatial data
- bayesian networks
- data structure
- missing values
- statistical methods
- raw data
- original data
- prior knowledge