Memorization vs. Generalization : Quantifying Data Leakage in NLP Performance Evaluation.
Aparna ElangovanJiayuan HeKarin VerspoorPublished in: EACL (2021)
Keyphrases
- data sets
- data analysis
- raw data
- data quality
- data collection
- training data
- data structure
- database
- data processing
- data distribution
- synthetic data
- image data
- high quality
- information extraction
- social media
- original data
- historical data
- input data
- statistical analysis
- probability distribution
- missing data
- network structure
- statistical methods
- high dimensional
- complex data