Representativeness of latent dirichlet allocation topics estimated from data samples with application to common crawl.
Yuheng DuAlexander HerzogAndré LuckowRamu NerellaChristopher GroppAmy W. AponPublished in: IEEE BigData (2017)
Keyphrases
- latent dirichlet allocation
- topic models
- data samples
- topic modeling
- latent topics
- topic discovery
- generative model
- lda model
- text mining
- gibbs sampling
- data points
- text documents
- variational bayesian inference
- machine learning
- probabilistic topic models
- probabilistic latent semantic analysis
- maximum likelihood
- co occurrence
- information extraction
- knn
- web pages
- topic extraction
- word counts
- information retrieval