A Sentence-Based Copy Detection Approach for Web Documents.
Rajiv YerraYiu-Kai NgPublished in: FSKD (1) (2005)
Keyphrases
- web documents
- copy detection
- information extraction
- natural language
- semi structured
- web pages
- web search engines
- keywords
- document classification
- web content
- textual information
- sentence level
- part of speech
- structured documents
- document representation
- web data
- n gram
- link structure
- vector space model
- parse tree
- semantic role labeling
- html documents
- topic specific
- text corpus
- web directories
- focused crawling
- unstructured documents