Exploiting N-gram Importance and Wikipedia based Additional Knowledge for Improvements in GAAC based Document Clustering.
Niraj KumarVenkata Vinay Babu VemulaKannan SrinathanVasudeva VarmaPublished in: KDIR (2010)
Keyphrases
- n gram
- document clustering
- additional knowledge
- document representation
- document collections
- bag of words
- language model
- text documents
- text classification
- domain knowledge
- clustering algorithm
- document corpus
- text mining
- clustering method
- language independent
- background knowledge
- information retrieval
- vector space model
- tf idf
- text retrieval
- prior knowledge
- information retrieval systems
- k means
- document retrieval
- language modeling
- databases
- test collection
- wordnet
- digital libraries
- retrieval model
- named entities
- knowledge base
- probabilistic model
- labeled data