A new method on the detection of near-replicas of web pages.
Jia-heng ZhengLi-xia WeiHongye TanPublished in: CIT (2008)
Keyphrases
- detection method
- search engine
- similarity measure
- dynamic programming
- objective function
- pairwise
- significant improvement
- cost function
- clustering method
- computational cost
- probabilistic model
- fault tolerance
- data sets
- detection rate
- web documents
- detection algorithm
- edge detection
- experimental evaluation
- feature space
- computational complexity
- keywords
- neural network