Improving text classification accuracy using topic modeling over an additional corpus.
Somnath BanerjeePublished in: SIGIR (2008)
Keyphrases
- topic modeling
- text corpora
- classification accuracy
- text mining
- topic models
- scientific articles
- text documents
- text corpus
- latent semantic analysis
- text data
- text classification
- feature selection
- latent dirichlet allocation
- information retrieval
- text analysis
- training set
- latent topics
- knowledge discovery
- text collections
- databases
- keywords
- collaborative filtering
- scientific literature
- text processing
- computational linguistics
- data analysis
- probabilistic topic models
- lda model
- feature space
- multiword
- k means
- machine learning
- wordnet
- support vector
- information extraction
- natural language processing
- generative model
- document clustering