Parallelized Near-Duplicate Document Detection Algorithm for Large Scale Chinese Web Pages.
Yongzhuang WeiShuai WangChunfeng YuanYihua HuangPublished in: PDCAT (2012)
Keyphrases
- detection algorithm
- web pages
- web documents
- keywords
- keyword extraction
- detection rate
- detection method
- page segmentation
- website
- motion detection
- detection accuracy
- textual content
- search engine
- text summarization
- corner detection
- boundary detection
- outlier detection
- feature detection
- document images
- html documents
- text documents
- document collections
- information retrieval systems
- information retrieval
- retrieval systems
- social annotations
- html pages
- moving objects
- document clustering
- vehicle detection
- fall detection
- harris corner
- frame difference
- face recognition
- image retrieval
- web search
- bag of words
- document representation
- relevant documents