Mining Source Code Topics Through Topic Model and Words Embedding.
Wei Emma ZhangQuan Z. ShengErmyas AbebeMuhammad Ali BabarAndi ZhouPublished in: ADMA (2016)
Keyphrases
- source code
- topic models
- latent topics
- software repositories
- text documents
- latent dirichlet allocation
- text mining
- topic modeling
- statistical topic models
- text corpora
- open source
- text streams
- mining software repositories
- software systems
- lda model
- word pairs
- software maintenance
- co occurrence
- probabilistic topic models
- software projects
- probabilistic model
- software evolution
- generative model
- news articles
- latent variables
- topic discovery
- baseline models
- topic tracking
- information extraction
- n gram
- data mining
- high level
- knowledge discovery
- vector space
- document clustering
- document representation
- bag of words
- text data
- named entities
- real world
- part of speech
- word sense disambiguation
- information retrieval