Login / Signup

Hadoop Based Parallel Deduplication Method for Web Documents.

Junjie SongJin LiuYuhui Zheng
Published in: CSA/CUTE (2017)
Keyphrases
  • web documents
  • similarity measure
  • open source
  • document classification
  • web pages
  • training set
  • information extraction
  • semi supervised
  • cloud computing
  • parallel implementation
  • map reduce