Interpretable and reconfigurable clustering of document datasets by deriving word-based rules.
Vipin BalachandranDeepak PDeepak KhemaniPublished in: CIKM (2009)
Keyphrases
- document clustering
- clustering method
- classification rules
- keywords
- text clustering
- k means
- clustering algorithm
- data mining tasks
- n gram
- tolerance rough set
- information retrieval
- text corpus
- document images
- retrieval systems
- related words
- high dimensional datasets
- clustering approaches
- synthetic datasets
- association rules
- document image retrieval
- syntactic categories
- spoken document retrieval
- synthetic and real datasets
- co occurrence
- clustering analysis
- term frequency
- document collections
- cluster analysis
- low cost
- cluster membership
- information retrieval systems
- web documents
- compound words
- word co occurrence
- document space
- related documents
- printed documents
- data points
- tf idf
- latent topics
- document representation
- text documents
- outlier detection
- short list
- word sense
- term weighting