Evaluating the Robustness of Embedding-Based Topic Models to OCR Noise.
Elaine ZosaStephen MutuviMark Granroth-WildingAntoine DoucetPublished in: ICADL (2021)
Keyphrases
- topic models
- topic modeling
- latent dirichlet allocation
- latent topics
- probabilistic model
- generative model
- text documents
- latent variables
- text mining
- gibbs sampling
- optical character recognition
- co occurrence
- text corpora
- data mining
- probabilistic topic models
- vector space
- query expansion
- image classification
- graphical models
- artificial intelligence
- machine learning