Login / Signup
Do not crawl in the DUST: different URLs with similar text.
Uri Schonfeld
Ziv Bar-Yossef
Idit Keidar
Published in:
WWW (2006)
Keyphrases
</>
web pages
web crawler
website
text mining
database
keywords
information retrieval
search engine
feature selection
web search
text documents
text retrieval
textual information
natural language generation
metadata
digital libraries
web documents
sentence level
text corpus