New system to fingerprint extensible markup language documents using winnowing theory.
Saad M. DarwishPublished in: IET Signal Process. (2012)
Keyphrases
- extensible markup language
- xml documents
- information retrieval
- document classification
- document collections
- markup language
- database
- xml data
- web documents
- document clustering
- document retrieval
- metadata
- relevant documents
- xml schema
- theoretical framework
- information retrieval systems
- feature extraction
- text documents
- semantic information
- low level
- vector space model
- knowledge base
- fingerprint identification