Improving OCR Text Categorization Accuracy with Electronic Abstracts.
Linlin LiChew Lim TanPublished in: DIAL (2006)
Keyphrases
- text categorization
- information gain
- text classification
- knn
- feature selection
- multi label
- k nearest neighbor
- ensemble pruning
- automated text categorization
- document classification
- text classifiers
- naive bayes
- document categorization
- reuters corpus
- text documents
- semi supervised learning
- term frequency
- feature reduction
- feature weighting
- automatic text categorization
- feature selection for text categorization
- semantic browsing
- unlabeled data
- nearest neighbor
- training documents
- classification accuracy
- feature selections
- text collections
- document frequency
- language model
- training set
- transductive support vector machine
- reinforcement learning