Influence Scores at Scale for Efficient Language Data Sampling.
Nikhil AnandJoshua TanMaria MinakovaPublished in: EMNLP (2023)
Keyphrases
- data sets
- data processing
- database
- synthetic data
- global scale
- complex data
- raw data
- data collection
- training data
- high quality
- machine learning
- experimental data
- data structure
- statistical analysis
- neural network
- data analysis
- noisy data
- statistical methods
- sample size
- missing data
- natural language
- object oriented
- small number
- data points
- data sources
- xml documents