Login / Signup
SmartMD: A High Performance Deduplication Engine with Mixed Pages.
Fan Guo
Yongkun Li
Yinlong Xu
Song Jiang
John C. S. Lui
Published in:
USENIX Annual Technical Conference (2017)
Keyphrases
</>
website
search engine
web pages
data cleaning
small sized
link structure
record linkage
web documents
web search
keywords
web users
anchor text
high reliability
textual content
scientific computing
case study
printed text