String techniques for detecting duplicates in document databases.
Daniel P. LoprestiPublished in: Int. J. Document Anal. Recognit. (2000)
Keyphrases
- databases
- database
- database systems
- retrieval systems
- relational databases
- information retrieval systems
- document images
- data structure
- document retrieval
- text documents
- web documents
- document collections
- document classification
- document clustering
- data warehouse
- knowledge discovery
- keywords
- pattern matching
- semantic information
- data integration
- data model
- clustering algorithm
- cross references