A Multilingual Dataset for Evaluating Parallel Sentence Extraction from Comparable Corpora.
Pierre ZweigenbaumSerge SharoffReinhard RappPublished in: LREC (2018)
Keyphrases
- comparable corpora
- cross language information retrieval
- news articles
- language modeling
- parallel corpora
- sentence extraction
- machine translation
- word pairs
- text corpora
- cross lingual
- automatic summarization
- query translation
- cross language
- multi document summarization
- language model
- text documents
- feature set
- active learning
- digital libraries
- text summarization
- semantic relations
- translation model
- n gram
- co occurrence