Word Embedding based Semantic Cross-Lingual Document Alignment in Comparable Corpora.
Debasis GangulyHaithem AfliDwaipayan RoyPublished in: FIRE (2018)
Keyphrases
- cross lingual
- comparable corpora
- word alignment
- word pairs
- parallel corpora
- translation model
- machine translation
- parallel corpus
- document clustering
- language modeling
- cross lingual information retrieval
- translation probabilities
- bilingual dictionaries
- cross language information retrieval
- semantic similarity
- semantic information
- source language
- text documents
- language independent
- tf idf
- cross language
- document collections
- statistical machine translation
- text classification
- keywords
- machine translation system
- text corpora
- vector space model
- query translation
- semantic relations
- n gram
- natural language
- semantic features
- latent semantic analysis
- information retrieval
- web documents
- retrieval systems
- semantic relationships
- news articles
- co occurrence
- sentiment classification
- document retrieval
- topic models
- information retrieval systems
- vector space
- transfer learning
- linguistic resources
- digital libraries
- wordnet
- relevance model
- language model
- knn
- probabilistic model
- query terms
- sentence level
- query expansion