Document image OCR accuracy prediction via latent Dirichlet allocation.
Xujun PengHuaigu CaoPrem NatarajanPublished in: ICDAR (2015)
Keyphrases
- document images
- latent dirichlet allocation
- topic models
- optical character recognition
- topic modeling
- document analysis
- document image analysis
- scanned documents
- document processing
- generative model
- ocr systems
- printed documents
- page layout
- text mining
- text lines
- lda model
- character recognition
- pattern recognition
- latent topics
- gibbs sampling
- data mining
- printed text
- variational bayesian inference