The effect of OCR errors on stylistic text classification.
Sterling Stuart SteinShlomo ArgamonOphir FriederPublished in: SIGIR (2006)
Keyphrases
- text classification
- recognition errors
- bag of words
- feature selection
- text categorization
- optical character recognition
- text mining
- text data
- multi label
- machine learning
- n gram
- labeled data
- naive bayes
- preprocessing
- database
- digital libraries
- document images
- document processing
- knn
- printed documents
- document analysis
- data cleaning
- text classifiers
- error analysis
- document classification
- character recognition
- sentiment analysis
- post processing