MT-based Sentence Alignment for OCR-generated Parallel Texts.
Rico SennrichMartin VolkPublished in: AMTA (2010)
Keyphrases
- parallel texts
- lexico syntactic
- cross language information retrieval
- parallel corpus
- machine translation
- manually constructed
- word alignment
- manually annotated
- query translation
- parallel corpora
- automatically generated
- statistical machine translation
- document images
- natural language
- machine translation system
- target language
- bilingual dictionaries
- wordnet
- language independent
- co occurrence
- source language
- part of speech
- ground truth