Parameterized generation of labeled datasets for text categorization based on a hierarchical directory.
Dmitry DavidovEvgeniy GabrilovichShaul MarkovitchPublished in: SIGIR (2004)
Keyphrases
- text categorization
- hierarchical text categorization
- unlabeled documents
- training documents
- text classification
- linear svm
- multi label
- hierarchical structure
- information gain
- knn
- text collections
- feature selection
- document classification
- semi supervised learning
- k nearest neighbor
- automated text categorization
- classify documents
- text documents
- reuters corpus
- naive bayes
- tf idf
- data sets
- unlabeled data
- training set
- text classifiers
- feature weighting
- document frequency
- automatic text categorization
- multi instance multi label learning
- feature selections
- term frequency
- supervised learning
- nearest neighbor
- training data
- decision trees
- bag of words
- unsupervised learning
- pairwise
- neural network