Are All the Datasets in Benchmark Necessary? A Pilot Study of Dataset Evaluation for Text Classification.
Yang XiaoJinlan FuSee-Kiong NgPengfei LiuPublished in: CoRR (2022)
Keyphrases
- pilot study
- text classification
- text classification tasks
- bag of words
- benchmark datasets
- synthetic datasets
- training dataset
- pascal voc
- text data
- text categorization
- high dimensional datasets
- computer games
- massive datasets
- database
- feature selection
- science learning
- text classifiers
- data sets
- support vector machine
- machine learning
- uci datasets
- epistemological beliefs
- text documents
- n gram
- elementary school
- microarray datasets
- data mining
- neural network
- idea generation
- standard learning algorithms