Combining Latent Dirichlet Allocation and K-Means for Documents Clustering: Effect of Probabilistic Based Distance Measures.
Quang Vu BuiKarim SayadiSoufian Ben AmorMarc BuiPublished in: ACIIDS (1) (2017)
Keyphrases
- k means
- distance measure
- latent dirichlet allocation
- cosine similarity
- generative model
- topic discovery
- topic models
- probabilistic topic models
- topic modeling
- lda model
- document clustering
- latent topics
- clustering algorithm
- clustering method
- latent semantic analysis
- generative process
- probabilistic model
- similarity measure
- text documents
- vector space
- expectation maximization
- gibbs sampling
- spectral clustering
- probabilistic latent semantic analysis
- cluster analysis
- text mining
- distance function
- information retrieval
- vector space model
- co occurrence
- bayesian framework
- text classification
- document representation
- parameter estimation
- dimensionality reduction
- conditional probabilities
- prior knowledge
- feature space
- keywords
- em algorithm