OCR Improves Machine Translation for Low-Resource Languages.
Oana IgnatJean MaillardVishrav ChaudharyFrancisco GuzmánPublished in: ACL (Findings) (2022)
Keyphrases
- machine translation
- target language
- language independent
- cross lingual
- statistical machine translation
- multilingual documents
- source language
- language specific
- language resources
- machine translation system
- query translation
- parallel corpora
- natural language processing
- cross language information retrieval
- optical character recognition
- machine readable dictionaries
- bilingual dictionaries
- language processing
- comparable corpora
- natural language
- chinese english
- cross lingual information retrieval
- information extraction
- natural language generation
- cross language
- character recognition
- word sense disambiguation
- word level
- word segmentation
- multilingual information retrieval
- document images
- brazilian portuguese
- linguistic resources
- word order
- grammar induction
- word alignment
- finite state transducers
- machine transliteration