Pseudo-document simulation for comparing LDA, GSDMM and GPM topic models on short and sparse text using Twitter data.
Christoph WeisserChristoph GerloffAnton ThielmannAndre PythonArik ReuterThomas KneibBenjamin SäfkenPublished in: Comput. Stat. (2023)
Keyphrases
- topic models
- latent dirichlet allocation
- latent topics
- text documents
- topic modeling
- topic discovery
- information retrieval
- data analysis
- text data
- high dimensional data
- microblog posts
- dimensionality reduction
- text mining
- social media
- bag of words
- probabilistic model
- natural language
- text corpora
- feature selection
- machine learning
- databases