Influence Scores at Scale for Efficient Language Data Sampling.
Nikhil AnandJoshua TanMaria MinakovaPublished in: CoRR (2023)
Keyphrases
- raw data
- data sets
- statistical analysis
- database
- data analysis
- synthetic data
- training data
- data quality
- monte carlo
- high quality
- sensor data
- global scale
- computer systems
- data collection
- data processing
- small number
- data points
- end users
- data sources
- image data
- prior knowledge
- xml documents
- high dimensional
- sample size
- training set
- database systems
- data distribution
- search engine
- data objects
- information retrieval
- sampling methods
- data mining