Use of text syntactical structures in detection of document duplicates.
Mohamed ElhadiAmjad Al-TobiPublished in: ICDIM (2008)
Keyphrases
- text documents
- information retrieval
- digital documents
- web documents
- keywords
- document processing
- document analysis
- text collections
- scientific papers
- text content
- text retrieval
- latent semantic analysis
- natural language text
- multimedia documents
- text lines
- textual documents
- semantic information
- database
- textual content
- document categorization
- text clustering
- keyword extraction
- extractive summarization
- technical papers
- text mining
- structured documents
- document classification
- document set
- detection method
- text corpus
- document content
- electronic documents
- document collections
- text detection
- user queries
- automatic text summarization
- document retrieval
- information extraction
- curvilinear structures
- detection algorithm
- scientific documents
- authorship attribution
- object detection
- information retrieval systems
- text categorization
- automatic summarization
- document structure
- document images
- document clustering
- textual data
- text classifiers