Are All the Datasets in Benchmark Necessary? A Pilot Study of Dataset Evaluation for Text Classification.
Yang XiaoJinlan FuSee-Kiong NgPengfei LiuPublished in: NAACL-HLT (2022)
Keyphrases
- pilot study
- text classification
- text classification tasks
- bag of words
- pascal voc
- benchmark datasets
- synthetic datasets
- training dataset
- feature selection
- text data
- machine learning
- text categorization
- high dimensional datasets
- computer games
- uci datasets
- database
- massive datasets
- experimental conditions
- text mining
- text classifiers
- real world
- middle school students
- pilot project
- knn