Are All the Datasets in Benchmark Necessary? A Pilot Study of Dataset Evaluation for Text Classification.

Yang Xiao Jinlan Fu See-Kiong Ng Pengfei Liu

Published in: NAACL-HLT (2022)

Keyphrases

pilot study
text classification
text classification tasks
bag of words
pascal voc
benchmark datasets
synthetic datasets
training dataset
feature selection
text data
machine learning
text categorization
high dimensional datasets
computer games
uci datasets
database
massive datasets
experimental conditions
text mining
text classifiers
real world
middle school students
pilot project
knn