More Than Words: Collocation Tokenization for Latent Dirichlet Allocation Models.
Jin CheevaprawatdomrongAlexandra SchofieldAttapol T. RutherfordPublished in: CoRR (2021)
Keyphrases
- latent dirichlet allocation
- latent topics
- lda model
- topic models
- probabilistic topic models
- mixed membership
- probabilistic latent semantic analysis
- topic modeling
- statistical topic models
- probabilistic model
- latent topic models
- topic discovery
- gibbs sampling
- n gram
- text mining
- keywords
- parameter estimation
- generative model
- variational inference
- pattern recognition
- latent variables
- text corpora
- model selection
- hidden markov models
- active learning
- hierarchical bayesian models
- data mining