Cross-lingual document similarity estimation and dictionary generation with comparable corpora.
Tadej StajnerDunja MladenicPublished in: Knowl. Inf. Syst. (2019)
Keyphrases
- cross lingual
- comparable corpora
- bilingual dictionaries
- parallel corpus
- parallel corpora
- document clustering
- machine translation
- language modeling
- cross language information retrieval
- cross language
- text documents
- language independent
- translation model
- information retrieval
- news articles
- query translation
- language model
- text classification
- information retrieval systems
- document retrieval
- source language
- text mining
- document collections
- machine translation system
- statistical machine translation
- web documents
- word pairs
- vector space model
- query terms
- text categorization
- latent semantic analysis
- retrieval model
- probabilistic model
- feature selection
- keywords
- clustering algorithm
- multiword
- labor intensive
- natural language processing
- retrieval systems
- web search engines