The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages.
Ralf SteinbergerBruno PouliquenAnna WidigerCamelia IgnatTomaz ErjavecDan TufisDániel VargaPublished in: LREC (2006)
Keyphrases
- parallel corpus
- cross lingual
- language independent
- query translation
- machine translation system
- target language
- cross lingual information retrieval
- machine translation
- sentence pairs
- statistical machine translation
- cross language
- lexical knowledge
- cross language information retrieval
- parallel corpora
- source language
- language modeling
- text classification
- comparable corpora
- document classification
- n gram
- bilingual dictionaries
- translation model
- document clustering
- word alignment
- text retrieval
- news articles
- language model
- clustering algorithm