Near-Duplication Document Detection Using Weight One Permutation Hashing.
Xinpan YuanSonglin WangXiaojun DengPublished in: J. Comput. Sci. Eng. (2019)
Keyphrases
- detection method
- detection algorithm
- detection accuracy
- data structure
- information retrieval
- web documents
- false positives
- automatic detection
- object detection
- information retrieval systems
- false alarms
- document classification
- signature file
- file organization
- page segmentation
- vector space model
- detection rate
- document collections