Allocating Large Vocabulary Capacity for Cross-Lingual Language Model Pre-Training.
Bo ZhengLi DongShaohan HuangSaksham SinghalWanxiang CheTing LiuXia SongFuru WeiPublished in: EMNLP (1) (2021)
Keyphrases
- language modeling
- cross lingual
- language model
- speech recognition
- translation model
- n gram
- language independent
- probabilistic model
- information retrieval
- pseudo feedback
- document retrieval
- cross lingual information retrieval
- cross language
- query expansion
- retrieval model
- cross language retrieval
- test collection
- statistical machine translation
- machine translation
- relevance model
- automatic speech recognition
- text classification
- training set
- pseudo relevance feedback
- word segmentation
- parallel corpora
- out of vocabulary
- bag of words
- query specific
- linguistic resources
- context sensitive
- query terms