Harvesting comparable corpora and mining them for equivalent bilingual sentences using statistical classification and analogy- based heuristics.
Krzysztof WolkEmilia RejmundKrzysztof MarasekPublished in: CoRR (2015)
Keyphrases
- comparable corpora
- statistical classification
- parallel corpora
- cross language information retrieval
- bilingual lexicon
- news articles
- word pairs
- machine translation
- sentence level
- text corpora
- machine translation system
- cross lingual
- language modeling
- text categorization
- text mining
- terminology extraction
- text documents
- cross language
- data mining
- word alignment
- query translation
- natural language
- sentiment analysis
- source language
- natural language processing
- bilingual dictionaries
- information extraction
- knowledge discovery
- labor intensive
- target language
- parallel corpus
- computational linguistics
- translation model
- statistical machine translation
- k nearest neighbor
- knn
- information retrieval