Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair Prediction.
Shubhanshu MishraAria HaghighiPublished in: CoRR (2021)
Keyphrases
- language model
- translation model
- social media
- language modeling
- machine translation system
- statistical machine translation
- information retrieval
- cross lingual
- cross language retrieval
- cross language
- cross language information retrieval
- document retrieval
- comparable corpora
- n gram
- text retrieval
- probabilistic model
- language independent
- query expansion
- speech recognition
- test collection
- language modelling
- document level
- retrieval model
- chinese english
- cross lingual information retrieval
- out of vocabulary
- multiword
- machine translation
- mixture model
- parallel corpora
- query terms
- query translation
- vector space model
- pseudo relevance feedback
- context sensitive
- text mining
- word level
- digital libraries
- ad hoc information retrieval
- text classification
- keywords
- smoothing methods
- bilingual dictionaries
- word pairs
- text documents
- document collections