Optical character recognition errors and their effects on natural language processing.
Daniel P. LoprestiPublished in: Int. J. Document Anal. Recognit. (2009)
Keyphrases
- optical character recognition
- natural language processing
- character recognition
- text recognition
- document images
- ocr systems
- information extraction
- machine learning
- character segmentation
- text mining
- page segmentation
- printed documents
- text processing
- residual errors
- handwriting recognition
- wordnet
- natural language
- error analysis
- question answering
- scanned documents
- image binarization
- machine translation
- probabilistic model
- text extraction
- word spotting
- computer vision
- historical manuscripts
- artificial intelligence