Clustering Documents Using a Wikipedia-Based Concept Representation.
Anna-Lan HuangDavid N. MilneEibe FrankIan H. WittenPublished in: PAKDD (2009)
Keyphrases
- document collections
- document clustering
- concept space
- clustering algorithm
- document retrieval
- wikipedia pages
- document corpus
- wikipedia articles
- text clustering
- web data
- hierarchical clustering
- k means
- information retrieval
- web documents
- document space
- named entities
- text documents
- semantic information
- data objects
- clustering method
- xml documents
- document representation
- hierarchical tree
- text representation
- cosine similarity
- information retrieval systems
- keywords
- unsupervised learning
- retrieved documents
- data points
- metadata