Source Prompt: Coordinated Pre-training of Language Models on Diverse Corpora from Multiple Sources.
Yipei XuDakuan LuJiaqing LiangXintao WangYipeng GengYingsi XinHengkui WuKen Chenruiji zhangYanghua XiaoPublished in: CoRR (2023)
Keyphrases
- multiple sources
- language model
- language modeling
- multi source
- n gram
- speech recognition
- probabilistic model
- statistical language models
- information retrieval
- data sources
- query expansion
- document retrieval
- statistical machine translation
- language modelling
- retrieval model
- context sensitive
- translation model
- vector space model
- document ranking
- query terms
- test collection
- relevance model
- viewpoint
- databases
- feature selection
- okapi bm