An efficient document clustering algorithm and its application to a document browser.
Hideki TanakaTadashi KumanoNoriyoshi UrataniTerumasa EharaPublished in: Inf. Process. Manag. (1999)
Keyphrases
- document clustering
- document collections
- document representation
- text mining
- clustering method
- clustering algorithm
- document similarity
- document clusters
- tolerance rough set
- text documents
- text clustering
- topic extraction
- vector space model
- tf idf
- k means
- document categorization
- data mining
- cluster analysis
- similar documents
- document corpus
- automatic categorization
- document classification
- named entities
- information retrieval systems
- natural language processing
- document set
- probabilistic model
- pairwise constraints
- web pages
- artificial intelligence
- information retrieval
- machine learning