A Case Study of Rule Based and Probabilistic Word Error Correction of Portuguese OCR Text in a "Real World" Environment for Inclusion in a Digital Library.
Brett DruryJosé João AlmeidaPublished in: Int. J. Comput. Linguistics Appl. (2010)
Keyphrases
- error correction
- digital libraries
- digital documents
- printed documents
- document processing
- error detection
- word pairs
- sentence level
- data hiding
- word level
- information retrieval
- error correcting
- english text
- printed text
- text retrieval
- keywords
- bayesian networks
- natural language text
- document analysis
- error detection and correction
- error control
- channel coding
- information access
- ldpc codes
- noun phrases
- optical character recognition
- historical manuscripts
- text mining
- text extraction
- document images
- word sense disambiguation
- co occurrence
- bit errors
- information theoretic
- text recognition
- turbo codes
- recognition errors
- page layout
- handwritten documents
- cross language
- reed solomon
- watermarking scheme