Collection statistics for fast duplicate document detection.
Abdur ChowdhuryOphir FriederDavid A. GrossmanM. Catherine McCabePublished in: ACM Trans. Inf. Syst. (2002)
Keyphrases
- document collections
- information retrieval
- automatic detection
- false positives
- document images
- information retrieval systems
- object detection
- detection algorithm
- detection rate
- detection method
- document clustering
- database
- distributed information retrieval
- related documents
- effective retrieval
- complex background
- text collections
- detection accuracy
- false alarms
- document classification
- statistical methods
- relevant documents
- probabilistic model
- keywords