Validation of Text Clustering Based on Document Contents.
Jarmo ToivonenAri VisaTomi VesanenBarbro BackHannu VanharantaPublished in: MLDM (2001)
Keyphrases
- textual content
- text content
- document structure
- text documents
- digital documents
- web documents
- document analysis
- document processing
- keywords
- information retrieval
- document content
- multimedia documents
- printed documents
- database
- tolerance rough set
- document images
- web pages
- information extraction
- text mining
- text corpus
- text collections
- semantic information
- document collections
- multimedia
- document categorization
- logical structure
- scientific papers
- text retrieval
- html pages
- text clustering
- document corpus
- scientific documents
- extractive summarization
- structured documents
- textual information
- document classification
- free text
- document clustering
- automatic text summarization
- scanned documents
- automatic summarization
- page layout analysis
- metadata
- document retrieval
- textual documents
- latent semantic analysis
- technical papers
- electronic documents
- document set
- text lines
- text representation
- related documents
- keyword extraction
- handwritten documents
- authorship attribution
- handwritten text
- retrieval engine
- relevant documents
- text categorization
- information retrieval systems
- digital libraries