Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training.
Bo ZhengLi DongShaohan HuangSaksham SinghalWanxiang CheTing LiuXia SongFuru WeiPublished in: CoRR (2021)
Keyphrases
- language modeling
- cross lingual
- language model
- speech recognition
- n gram
- translation model
- pseudo feedback
- document retrieval
- information retrieval
- language independent
- probabilistic model
- cross language
- retrieval model
- statistical machine translation
- test collection
- cross lingual information retrieval
- query expansion
- cross language retrieval
- smoothing methods
- relevance model
- context sensitive
- machine translation
- text classification
- query translation
- vector space model
- pseudo relevance feedback
- word segmentation
- linguistic resources
- out of vocabulary
- generative model
- query specific
- machine learning
- parallel corpora
- multiword
- speech signal
- information extraction
- query terms
- bag of words
- supervised learning
- topic models