Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data.
Yaoqing YangRyan TheisenLiam HodgkinsonJoseph E. GonzalezKannan RamchandranCharles H. MartinMichael W. MahoneyPublished in: CoRR (2022)
Keyphrases
- data sets
- raw data
- natural language processing
- data collection
- historical data
- data quality
- data processing
- training data
- database
- training examples
- accurate models
- synthetic data
- data sources
- prior knowledge
- data analysis
- data structure
- probability distribution
- high dimensional data
- test data
- statistical methods
- databases
- learned models
- decision trees
- statistical analysis
- test set
- high dimensional
- semantic relations