An investigation of linguistic features and clustering algorithms for topical document clustering.
Vasileios HatzivassiloglouLuis GravanoAnkineedu MagantiPublished in: SIGIR (2000)
Keyphrases
- document clustering
- linguistic features
- semantic features
- clustering algorithm
- text documents
- named entities
- text mining
- text classification
- document collections
- k means
- clustering method
- topic models
- tf idf
- keywords
- cluster analysis
- vector space model
- sentence level
- feature set
- named entity recognition
- pairwise constraints
- data mining
- supervised learning
- knn
- cross language information retrieval
- artificial intelligence
- machine learning