Enhancing Document Clustering through Heuristics and Summary-Based Pre-processing.
Sri Harsha AllamrajuRobert ChunPublished in: HCI (9) (2009)
Keyphrases
- document clustering
- preprocessing
- text mining
- document collections
- clustering algorithm
- clustering method
- feature extraction
- cosine similarity
- text documents
- document representation
- tolerance rough set
- negative matrix factorization
- cluster analysis
- vector space model
- document summarization
- topic extraction
- clustering quality
- pairwise constraints
- document clusters
- k means
- information extraction
- databases
- text summarization
- natural language