Deduplication of Scholarly Documents using Locality Sensitive Hashing and Word Embeddings.
Bikash GyawaliLucas AnastasiouPetr KnothPublished in: LREC (2020)
Keyphrases
- locality sensitive hashing
- binary codes
- similarity search
- vector space
- nearest neighbor search
- nearest neighbor
- document collections
- brute force
- keywords
- hash functions
- space efficient
- knn
- information retrieval
- hamming distance
- metric space
- xml documents
- co occurrence
- document retrieval
- sift features
- indexing techniques
- distance function
- multimedia documents
- multimedia retrieval
- range queries
- digital libraries
- n gram
- retrieval systems
- database
- high dimensional data
- information retrieval systems
- similarity measure
- metadata
- visual features
- scale space
- databases
- data sets