Probabilistic Retrieval of OCR Degraded Text Using N-Grams.
Stephen M. HardingW. Bruce CroftC. WeirPublished in: ECDL (1997)
Keyphrases
- n gram
- probabilistic retrieval
- ocr systems
- optical character recognition
- character n grams
- document images
- word level
- language model
- character recognition
- text classification
- bag of words
- web documents
- language independent
- information retrieval
- language specific
- scanned documents
- variable length
- document analysis
- part of speech
- text documents
- text mining
- semantic information
- viterbi algorithm
- text retrieval
- word segmentation
- machine vision
- cross language information retrieval
- nearest neighbor
- information extraction
- keywords
- bayesian networks