Farewell to Aimless Large-scale Pretraining: Influential Subset Selection for Language Model.
Xiao WangWeikang ZhouQi ZhangJie ZhouSongyang GaoJunzhe WangMenghan ZhangXiang GaoYunwen ChenTao GuiPublished in: ACL (Findings) (2023)
Keyphrases
- language model
- subset selection
- language modeling
- feature selection
- document retrieval
- n gram
- probabilistic model
- speech recognition
- information retrieval
- retrieval model
- language modelling
- test collection
- mixture model
- ad hoc information retrieval
- query expansion
- context sensitive
- hill climbing
- document ranking
- query terms
- smoothing methods
- statistical language models
- language models for information retrieval
- language model for information retrieval
- pseudo relevance feedback
- bayesian networks
- translation model
- relevance model
- query specific
- genetic algorithm ga
- document collections
- document length
- co occurrence
- search space