A High-Quality Multilingual Dataset for Structured Documentation Translation.
Kazuma HashimotoRaffaella BuschiazzoJames BradburyTeresa MarshallRichard SocherCaiming XiongPublished in: CoRR (2020)
Keyphrases
- high quality
- cross language information retrieval
- language resources
- cross language ir
- cross language
- chinese english
- machine translation
- machine translation system
- query translation
- digital libraries
- benchmark datasets
- database
- comparable corpora
- low quality
- structured data
- real world
- bilingual dictionaries
- parallel corpus
- data sets
- higher quality
- training dataset
- language independent
- statistical machine translation
- image quality
- parallel corpora
- ground truth
- high resolution
- cross lingual information retrieval
- learning algorithm