A multistrategy approach for digital text categorization from imbalanced documents.
M. Dolores del CastilloJose Ignacio SerranoPublished in: SIGKDD Explor. (2004)
Keyphrases
- text categorization
- text documents
- automatic text categorization
- document classification
- automatic categorization
- training documents
- document categorization
- text classifiers
- text collections
- term frequency
- term selection
- text classification
- classify documents
- feature selection
- knn
- information gain
- distributional clustering
- reuters corpus
- multi label
- naive bayes
- document frequency
- term weighting
- k nearest neighbor
- document clustering
- information retrieval
- semi supervised learning
- word frequency
- document representation
- document retrieval
- metadata
- data mining
- neural network
- feature selections
- data sets
- tf idf
- query terms
- web documents
- unlabeled data
- semi supervised
- knowledge discovery
- machine learning