Evaluating the Impact of OCR Errors on Topic Modeling.
Stephen MutuviAntoine DoucetMoses OdeoAdam JatowtPublished in: ICADL (2018)
Keyphrases
- topic modeling
- topic models
- recognition errors
- latent dirichlet allocation
- modeling framework
- text mining
- text classification
- document images
- scientific articles
- optical character recognition
- latent variables
- latent topics
- topic extraction
- text documents
- error correction
- collaborative filtering
- text corpora
- character recognition
- co occurrence
- pairwise
- search engine
- probabilistic latent semantic analysis
- data mining
- neural network