ClusterDataSplit: Exploring Challenging Clustering-Based Data Splits for Model Performance Evaluation.
Hanna WeckerAnnemarie FriedrichHeike AdelPublished in: Eval4NLP (2020)
Keyphrases
- data sets
- experimental data
- simulation data
- probabilistic model
- probability distribution
- training data
- formal model
- test data
- statistical model
- data collection
- input data
- database
- high level
- historical data
- computational model
- synthetic data
- high quality
- neural network
- measured data
- learning models
- data analysis
- expert knowledge
- original data
- data sources
- small number
- management system
- statistical analysis
- data points
- spatial data
- sensor data
- mathematical model
- missing data
- image data
- data structure
- empirical data
- parameter estimation