Naïve Clustering of a large XML Document Collection.
Antoine DoucetHelena Ahonen-MykaPublished in: INEX Workshop (2002)
Keyphrases
- document collections
- document clustering
- xml retrieval
- test collection
- information retrieval systems
- document retrieval
- information retrieval
- document clusters
- bayes classifiers
- xml documents
- bayesian classifiers
- digital libraries
- document representation
- clustering algorithm
- index terms
- cross language
- clustering method
- relevant documents
- xml data
- k means
- metadata
- text retrieval
- text categorization
- xml schema
- relational databases
- ad hoc retrieval
- document archives
- xml queries
- cluster analysis
- text mining
- database
- vector space model
- data model
- query processing
- information extraction
- document set
- semi structured
- document structure
- content and structure
- similarity measure
- related documents
- learning algorithm
- text classification