Improving Topic Modeling Performance through N-gram Removal.
Mohamad AlmgerbiAndrea De MauroAdham KahlawiValentina PoggioniPublished in: WI/IAT (2021)
Keyphrases
- n gram
- topic modeling
- text classification
- topic models
- language model
- text mining
- language independent
- bag of words
- latent dirichlet allocation
- language modeling
- variable length
- text categorization
- machine learning
- text documents
- word segmentation
- document classification
- feature selection
- part of speech
- collaborative filtering
- statistical language modeling
- document clustering
- retrieval model
- labeled data
- generative model
- knowledge discovery
- prior knowledge
- text classifiers
- bayesian networks