Distributed Gibbs Sampling and LDA Modelling for Large Scale Big Data Management on PySpark.
Christos N. KarrasAristeidis KarrasDimitrios TsolisKonstantinos C. GiotopoulosSpyros SioutasPublished in: SEEDA-CECNSM (2022)
Keyphrases
- gibbs sampling
- latent dirichlet allocation
- topic models
- data management
- lda model
- markov chain
- parameter estimation
- topic modeling
- generative model
- approximate inference
- belief networks
- text mining
- distributed systems
- em algorithm
- expectation maximization
- exact inference
- graphical models
- dimensionality reduction
- maximum likelihood
- variational inference
- machine learning
- linear discriminant analysis
- co occurrence
- probabilistic inference
- probabilistic model
- prior knowledge
- feature extraction
- face recognition
- image segmentation