The evolution of a crawling strategy for an academic document search engine: whitelists and blacklists.
Jian WuPradeep B. TeregowdaJuan Pablo Fernández RamírezPrasenjit MitraShuyi ZhengC. Lee GilesPublished in: WebSci (2012)
Keyphrases
- search engine
- keywords
- retrieval systems
- information retrieval
- google scholar
- user queries
- web pages
- web search
- result merging
- web crawling
- document images
- current web search engines
- search queries
- web search engines
- relevant content
- text documents
- web retrieval
- document retrieval
- structured documents
- web documents
- document collections
- web crawler
- search result
- relevance ranking
- evolutionary game theory
- retrieval strategies
- document representation
- document clustering
- web mining
- text mining
- website