Expanding machine translation training data with an out-of-domain corpus using language modeling based vocabulary saturation.
Burak AydinArzucan ÖzgürPublished in: AMTA (2014)
Keyphrases
- language modeling
- machine translation
- cross lingual
- comparable corpora
- language model
- training data
- statistical machine translation
- parallel corpus
- finite state transducers
- machine translation system
- parallel corpora
- query expansion
- language independent
- sentence retrieval
- retrieval model
- probabilistic model
- information retrieval
- cross language
- natural language processing
- translation model
- n gram
- information extraction
- learning algorithm
- word sense disambiguation
- target language
- training set
- natural language
- word alignment
- out of vocabulary
- cross language information retrieval
- sentiment classification
- text classification
- cross domain
- query translation
- data mining
- co occurrence
- test collection
- document level
- relevance model