Bilingual corpus cleaning focusing on translation literality.
Kenji ImamuraEiichiro SumitaPublished in: INTERSPEECH (2002)
Keyphrases
- parallel corpora
- parallel corpus
- statistical machine translation
- chinese english
- machine translation
- sentence pairs
- machine translation system
- cross language information retrieval
- comparable corpora
- parallel texts
- query translation
- word alignment
- cross lingual
- english chinese
- multiword
- manually annotated
- language resources
- language independent
- english words
- cross lingual information retrieval
- word pairs
- source language
- bilingual dictionaries
- wordnet
- target language
- cross language retrieval
- bilingual lexicon
- cross language
- sentence level
- news articles
- labor intensive
- translation model
- lexical knowledge
- statistical translation models
- language model
- information extraction
- linguistic resources
- out of vocabulary
- training corpus
- wikipedia articles
- noun phrases
- test set
- co occurrence
- information retrieval