Login / Signup
Hadoop Based Parallel Deduplication Method for Web Documents.
Junjie Song
Jin Liu
Yuhui Zheng
Published in:
CSA/CUTE (2017)
Keyphrases
</>
web documents
similarity measure
open source
document classification
web pages
training set
information extraction
semi supervised
cloud computing
parallel implementation
map reduce