The Tatoeba Translation Challenge - Realistic Data Sets for Low Resource and Multilingual MT.
Jörg TiedemannPublished in: CoRR (2020)
Keyphrases
- machine translation
- data sets
- cross language information retrieval
- language resources
- cross lingual
- query translation
- chinese english
- language independent
- machine translation system
- language specific
- cross language
- cross lingual information retrieval
- multilingual documents
- comparable corpora
- resource allocation
- parallel corpora
- word alignment
- cross language ir
- natural language processing
- synthetic data
- real world data sets
- high levels
- target language
- real world
- statistical machine translation
- bilingual dictionaries
- parallel corpus
- neural network
- training data
- natural language
- digital libraries
- training set
- linguistic resources
- real life
- resource management
- resource constraints
- artificial intelligence
- information extraction
- source language
- resource consumption
- information access