Harvesting Comparable Corpora and Mining Them for Equivalent Bilingual Sentences Using Statistical Classification and Analogy-Based Heuristics.
Krzysztof WolkEmilia RejmundKrzysztof MarasekPublished in: ISMIS (2015)
Keyphrases
- comparable corpora
- statistical classification
- parallel corpora
- cross language information retrieval
- bilingual lexicon
- news articles
- machine translation
- word pairs
- sentence level
- language modeling
- machine translation system
- terminology extraction
- text corpora
- cross lingual
- cross language
- text mining
- text documents
- bilingual dictionaries
- text categorization
- source language
- translation model
- natural language
- knowledge discovery
- word alignment
- statistical machine translation
- target language
- multi document summarization
- query translation
- bi directional
- parallel corpus
- machine learning
- language model
- feature extraction