Login / Signup
CopyCat: Near-Duplicates Within and Between the ClueWeb and the Common Crawl.
Maik Fröbe
Janek Bevendorff
Lukas Gienapp
Michael Völske
Benno Stein
Martin Potthast
Matthias Hagen
Published in:
SIGIR (2021)
Keyphrases
</>
web search
search engine
test collection
trec web track
web pages
deep web
focused crawling
web crawling
web crawlers
neural network
real world
artificial intelligence
knowledge base
high level
evolutionary algorithm