A new method on the detection of near-replicas of web pages.

Jia-heng Zheng Li-xia Wei Hongye Tan

Published in: CIT (2008)

Keyphrases

detection method
search engine
similarity measure
dynamic programming
objective function
pairwise
significant improvement
cost function
clustering method
computational cost
probabilistic model
fault tolerance
data sets
detection rate
web documents
detection algorithm
edge detection
experimental evaluation
feature space
computational complexity
keywords
neural network