A Recipe of Parallel Corpora Exploitation for Multilingual Large Language Models.
Peiqin LinAndré F. T. MartinsHinrich SchützePublished in: CoRR (2024)
Keyphrases
- language model
- parallel corpora
- language modeling
- cross lingual
- comparable corpora
- cross language information retrieval
- language independent
- cross language
- document retrieval
- cross lingual information retrieval
- n gram
- statistical machine translation
- query terms
- translation model
- machine translation system
- information retrieval
- chinese english
- probabilistic model
- query expansion
- retrieval model
- bilingual dictionaries
- parallel corpus
- query translation
- relevance model
- machine translation
- context sensitive
- test collection
- sentence level
- document representation
- out of vocabulary
- pseudo relevance feedback
- sentiment classification
- vector space model
- multiword
- search queries
- document collections
- web search
- feature selection