STACC, OOV Density and N-gram Saturation: Vicomtech's Participation in the WMT 2018 Shared Task on Parallel Corpus Filtering.
Andoni AzpeitiaThierry EtchegoyhenEva Martínez GarciaPublished in: WMT (shared task) (2018)
Keyphrases
- n gram
- parallel corpus
- cross lingual
- out of vocabulary
- language independent
- language modeling
- cross language information retrieval
- language model
- text classification
- machine translation
- word segmentation
- cross language
- parallel corpora
- query translation
- part of speech
- bag of words
- word alignment
- translation model
- statistical machine translation
- word level
- named entity recognition
- query terms
- relevance model
- machine learning
- machine translation system
- feature selection