Evaluating Models of Latent Document Semantics in the Presence of OCR Errors.
Daniel David WalkerWilliam B. LundEric K. RinggerPublished in: EMNLP (2010)
Keyphrases
- document images
- information retrieval
- document processing
- model selection
- document collections
- recognition errors
- web documents
- information retrieval systems
- digital libraries
- prior knowledge
- database
- preprocessing
- semantic information
- statistical models
- optical character recognition
- latent topics
- feature selection
- neural network