ParaCrawl: Web-Scale Acquisition of Parallel Corpora.
Marta BañónPinzhen ChenBarry HaddowKenneth HeafieldHieu HoangMiquel Esplà-GomisMikel L. ForcadaAmir KamranFaheem KirefuPhilipp KoehnSergio Ortiz-RojasLeopoldo Pla SempereGema Ramírez-SánchezElsa SarríasMarek StrelecBrian ThompsonWilliam WaitesDion WigginsJaume ZaragozaPublished in: ACL (2020)
Keyphrases
- web scale
- parallel corpora
- cross language information retrieval
- machine translation
- labor intensive
- language independent
- word pairs
- semi structured
- image search
- cross lingual
- machine translation system
- cross language
- query translation
- wikipedia articles
- sentence level
- databases
- web images
- semi automatic
- statistical machine translation
- machine learning
- knowledge representation
- structured data
- information extraction