Online duplicate document detection: signature reliability in a dynamic retrieval environment.
Jack G. ConradXi S. GuoCindy P. SchriberPublished in: CIKM (2003)
Keyphrases
- dynamic environments
- retrieval systems
- information retrieval systems
- information retrieval
- document retrieval
- signature file
- online environment
- real time
- online learning
- structured documents
- changing environment
- page segmentation
- image retrieval
- document analysis
- retrieval engine
- object detection
- mobile robot
- effective retrieval
- document ranking
- document clustering
- relevant documents
- digital libraries
- test collection
- trec web
- query terms
- relevance feedback
- image database
- query specific
- query expansion
- retrieval quality
- index terms
- signature verification
- passage retrieval
- web documents
- detection method
- text retrieval
- document level
- detection algorithm
- document collections
- document structure
- document representation
- tf idf