Measuring XML Document Similarity: A Case Study for Evaluating Information Extraction Systems.
Gerardo CanforaLuigi CeruloRita ScognamiglioPublished in: IEEE METRICS (2004)
Keyphrases
- document similarity
- information extraction systems
- information extraction
- free text
- xml documents
- document clustering
- graph theory
- text documents
- document representation
- structured data
- text processing
- semi structured
- cosine similarity
- data model
- vector space model
- relevance model
- metadata
- natural language text
- databases
- semantic similarity
- latent dirichlet allocation
- clustering method
- natural language processing
- pairwise
- natural language
- feature extraction
- feature selection