ParaCrawl: Web-scale parallel corpora for the languages of the EU.
Miquel Esplà-GomisMikel L. ForcadaGema Ramírez-SánchezHieu HoangPublished in: MTSummit (2) (2019)
Keyphrases
- web scale
- parallel corpora
- language independent
- comparable corpora
- cross lingual
- machine translation
- cross language information retrieval
- labor intensive
- machine translation system
- bilingual dictionaries
- statistical machine translation
- word pairs
- cross language
- query translation
- image search
- semi structured
- sentence level
- web images
- wikipedia articles
- information extraction
- databases
- knowledge discovery
- feature space
- data mining