The Chinese Duplicate Web Pages Detection Algorithm based on Edit Distance.
Junxiu AnPengsen ChengPublished in: J. Softw. (2013)
Keyphrases
- detection algorithm
- edit distance
- web pages
- graph matching
- search engine
- detection method
- edit operations
- string matching
- similarity measure
- detection accuracy
- distance function
- string edit distance
- levenshtein distance
- distance measure
- outlier detection
- approximate string matching
- keywords
- web documents
- string similarity
- tree structured data
- graph edit distance
- moving objects
- chinese characters
- tree edit distance
- dynamic programming
- normalized edit distance
- harris corner
- approximate matching
- similarity join
- feature points
- text classification