Login / Signup
Do not crawl in the DUST: Different URLs with similar text.
Ziv Bar-Yossef
Idit Keidar
Uri Schonfeld
Published in:
ACM Trans. Web (2009)
Keyphrases
</>
web pages
search engine
website
web search
web crawler
text mining
web documents
database
query logs
string matching
information retrieval
neural network
web search engines
text retrieval
anchor text
sentence level
data sets