Combining coregularization and consensus-based self-training for multilingual text categorization.
Massih-Reza AminiCyril GoutteNicolas UsunierPublished in: SIGIR (2010)
Keyphrases
- text categorization
- semi supervised learning
- cross language
- text classification
- feature selection
- information gain
- knn
- multi label
- unlabeled data
- training set
- naive bayes
- automated text categorization
- k nearest neighbor
- semi supervised
- co training
- automatic text categorization
- reuters corpus
- feature weighting
- term frequency
- text collections
- text documents
- labeled data
- machine learning
- unsupervised learning
- cross lingual
- text classifiers
- supervised learning
- training data
- support vector
- natural language
- training documents
- language model
- feature selections
- semantic browsing