Two Stacks Are Better Than One: A Comparison of Language Modeling and Translation as Multilingual Pretraining Objectives.
Zihao LiShaoxiong JiTimothee MickusVincent SegonneJörg TiedemannPublished in: CoRR (2024)
Keyphrases
- language modeling
- cross lingual
- comparable corpora
- language model
- translation model
- query expansion
- information retrieval
- cross language
- parallel corpora
- retrieval model
- cross language information retrieval
- machine translation
- parallel corpus
- n gram
- query translation
- probabilistic model
- machine translation system
- language independent
- sentence retrieval
- statistical machine translation
- bilingual dictionaries
- document retrieval
- word segmentation
- text classification
- improvements in retrieval effectiveness
- relevance model
- statistical language models
- statistical language modeling
- statistical translation models
- linguistic resources
- query terms
- test collection
- high dimensional