Searching OCR'ed Text: An LDA Based Approach.
Ehtesham HassanVikram GargS. K. Mirajul HaqueSantanu ChaudhuryMadan GopalPublished in: ICDAR (2011)
Keyphrases
- text recognition
- optical character recognition
- document processing
- printed documents
- linear discriminant analysis
- face recognition
- latent dirichlet allocation
- document analysis
- latent semantic analysis
- text extraction
- information retrieval
- topic models
- text mining
- ocr systems
- post processing
- generative model
- document images
- discriminant analysis
- free text
- dimensionality reduction
- principal component analysis
- keywords
- character recognition
- semantic information
- printed text
- word counts
- error correction
- text retrieval
- string matching
- scanned documents
- preprocessing
- feature extraction
- text analysis
- text corpora
- page layout