Enriching Character-Based Neural Machine Translation with Modern Documents for Achieving an Orthography Consistency in Historical Documents.
Miguel DomingoFrancisco CasacubertaPublished in: ICIAP Workshops (2019)
Keyphrases
- machine translation
- historical documents
- handwriting recognition
- word recognition
- document images
- multilingual documents
- cross lingual
- information extraction
- historical manuscripts
- language independent
- statistical machine translation
- target language
- source language
- parallel corpora
- handwritten documents
- optical character recognition
- machine translation system
- natural language
- cross language information retrieval
- parallel corpus
- natural language processing
- word alignment
- printed documents
- text lines
- information retrieval
- handwritten text
- word segmentation
- data mining
- character recognition
- web documents
- word spotting
- document collections
- text categorization