Measuring corpus homogeneity using a range of measures for inter-document distance.
Gabriela CavagliaPublished in: LREC (2002)
Keyphrases
- intra class
- document corpus
- wide range
- document level
- distance measure
- document collections
- temporal expressions
- retrieval systems
- document clustering
- information retrieval
- information retrieval systems
- document representation
- document retrieval
- text corpus
- similar documents
- document classification
- web documents
- scientific papers
- document images
- document analysis
- text collections
- affinity measure
- keywords
- text categorization
- manually annotated
- text corpora
- distance function
- text documents
- semantic information
- vector space model
- evaluation measures
- training documents
- word frequency
- topic detection and tracking
- text data
- automatic summarization
- multiword
- word sense
- noun phrases
- range data
- relevant documents
- bag of words
- euclidean distance
- image segmentation