Multilingual De-Duplication Strategies: Applying scalable similarity search with monolingual & multilingual embedding models.
Stefan PaschDimitirios PetridisJannic CuturaPublished in: CoRR (2024)
Keyphrases
- similarity search
- cross lingual
- cross language
- metric space
- vector space
- high dimensional
- multimedia databases
- similarity measure
- cross language information retrieval
- distance function
- multilingual information retrieval
- high dimensional data
- similarity searching
- similarity retrieval
- query processing
- binary codes
- pattern recognition
- information retrieval
- bilingual dictionaries
- similarity queries
- question answering
- digital libraries
- r tree
- machine translation
- data structure
- hash functions
- knn
- text retrieval
- cross view