RCV1: A New Benchmark Collection for Text Categorization Research.
David D. LewisYiming YangTony G. RoseFan LiPublished in: J. Mach. Learn. Res. (2004)
Keyphrases
- text categorization
- text collections
- automatic categorization
- text classification
- feature selection
- knn
- information gain
- reuters corpus
- k nearest neighbor
- text classifiers
- multi label
- automated text categorization
- text documents
- semi supervised learning
- document categorization
- automatic text categorization
- feature weighting
- naive bayes
- document collections
- term weighting
- term selection
- data sets
- feature extraction
- unlabeled data
- semi supervised
- decision trees
- feature selections