Character confusion versus focus word-based correction of spelling and OCR variants in corpora.
Martin ReynaertPublished in: Int. J. Document Anal. Recognit. (2011)
Keyphrases
- optical character recognition
- printed documents
- text recognition
- character recognition
- error correction
- printed text
- word spotting
- preprocessing
- document images
- document image retrieval
- handwriting recognition
- recognition errors
- text input
- keywords
- end to end
- ocr systems
- handwritten words
- word frequency
- character segmentation
- document image analysis
- text lines
- post processing
- cursive handwriting
- page layout
- text localization and recognition
- spelling correction
- handwritten characters
- training corpus
- license plate
- n gram
- co occurrence
- natural language processing
- hidden markov models