Finding Near-Replicas of Documents and Servers on the Web.
Narayanan ShivakumarHector Garcia-MolinaPublished in: WebDB (1998)
Keyphrases
- web documents
- web data
- multilingual documents
- web information
- digital documents
- website
- document collections
- information retrieval
- web applications
- web pages
- user interests
- document repositories
- document classification
- web mining
- current web search engines
- content similarity
- information retrieval systems
- text information
- structured information
- web queries
- textual data
- fault tolerant
- newspaper articles
- information sources
- web search
- relevant documents
- load balancing
- information extraction
- google scholar
- digital libraries
- keywords
- semi structured
- document retrieval
- retrieval systems
- electronic documents
- web crawler
- open directory project
- database
- search interface
- search tasks
- user generated content
- document clustering
- text documents
- xml documents
- metadata
- search engine