The Tatoeba Translation Challenge - Realistic Data Sets for Low Resource and Multilingual MT.
Jörg TiedemannPublished in: WMT@EMNLP (2020)
Keyphrases
- machine translation
- data sets
- cross language information retrieval
- language resources
- cross lingual
- chinese english
- language independent
- query translation
- cross language
- machine translation system
- language specific
- cross lingual information retrieval
- real life
- real world
- parallel corpus
- multilingual documents
- comparable corpora
- statistical machine translation
- resource allocation
- resource management
- information extraction
- target language
- bilingual dictionaries
- parallel corpora
- cross language ir
- word alignment
- high levels
- web resources
- benchmark data sets
- natural language processing
- digital libraries
- data streams
- real world data sets
- resource consumption
- information resources
- text retrieval
- question answering
- high dimensional data
- wordnet
- data sources
- training data
- information retrieval
- database