Fixing the Threshold for Effective Detection of Near Duplicate Web Documents in Web Crawling.

V. A. Narayana P. Premchand A. Govardhan

Published in: ADMA (1) (2010)

Keyphrases

web documents
web crawling
topic specific
focused crawling
web data
web pages
semi structured
web mining
web search engines
search engine
information extraction
deep web
link structure
vector space model
database
data mining
web content
databases
text categorization
wordnet
relational databases
database systems
information retrieval