Evaluating text categorization in the presence of OCR errors.
Kazem TaghvaThomas A. NartkerJulie BorsackSteven E. LumosAllen ConditRon YoungPublished in: Document Recognition and Retrieval (2001)
Keyphrases
- text categorization
- feature selection
- text classification
- knn
- document classification
- information gain
- text classifiers
- naive bayes
- multi label
- automated text categorization
- reuters corpus
- text documents
- semantic browsing
- k nearest neighbor
- tf idf
- term frequency
- text collections
- term selection
- neural network
- information extraction
- document categorization
- automatic text categorization
- knowledge discovery
- feature selection and classifier
- multi instance multi label learning
- data sets