Using the Web as corpus for self-training text categorization.
Rafael Guzmán-CabreraManuel Montes-y-GómezPaolo RossoLuis Villaseñor PinedaPublished in: Inf. Retr. (2009)
Keyphrases
- text categorization
- semi supervised learning
- text classification
- text collections
- feature selection
- information gain
- training documents
- multi label
- knn
- textual data
- reuters corpus
- website
- text documents
- web pages
- k nearest neighbor
- document classification
- automated text categorization
- web content
- naive bayes
- web documents
- automatic text categorization
- unlabeled data
- training corpus
- text classifiers
- word frequency
- distributional clustering
- feature selection for text categorization
- co training
- term frequency
- term weighting
- user generated content
- tf idf
- semi supervised
- support vector machine
- multi instance multi label learning
- information retrieval
- machine learning