Evaluation of Text Clustering Algorithms with N-Gram-Based Document Fingerprints.
Javier ParaparAlvaro BarreiroPublished in: ECIR (2009)
Keyphrases
- clustering algorithm
- document clustering
- text documents
- information retrieval
- text clustering
- keywords
- web documents
- textual content
- text mining
- document processing
- digital documents
- document collections
- document content
- multimedia documents
- keyword extraction
- textual documents
- document analysis
- latent semantic analysis
- k means
- document images
- scientific documents
- document set
- text data
- information retrieval systems
- text summarization
- scientific papers
- text retrieval
- text collections
- document level
- document retrieval
- document structure
- text content
- relevant documents
- automatic summarization
- retrieval engine
- data clustering
- text lines
- fuzzy clustering
- document corpus