Fixing the Threshold for Effective Detection of Near Duplicate Web Documents in Web Crawling.
V. A. NarayanaP. PremchandA. GovardhanPublished in: ADMA (1) (2010)
Keyphrases
- web documents
- web crawling
- topic specific
- focused crawling
- web data
- web pages
- semi structured
- web mining
- web search engines
- search engine
- information extraction
- deep web
- link structure
- vector space model
- database
- data mining
- web content
- databases
- text categorization
- wordnet
- relational databases
- database systems
- information retrieval