Ensembles of Classifiers for Cleaning Web Parallel Corpora and Translation Memories.
Eduard BarbuPublished in: RANLP (2017)
Keyphrases
- parallel corpora
- cross language information retrieval
- machine translation
- comparable corpora
- machine translation system
- language independent
- query translation
- statistical machine translation
- decision trees
- cross lingual
- english chinese
- language resources
- labor intensive
- web pages
- cross language
- word pairs
- parallel texts
- bilingual dictionaries
- web documents
- training data
- sentence pairs
- sentence level
- web mining
- information sources
- web data
- translation model
- naive bayes
- search engine
- fully automated
- wikipedia articles
- n gram
- co occurrence
- query terms
- parallel corpus
- training set
- error prone