What Language Model to Train if You Have One Million GPU Hours?
Teven Le ScaoThomas WangDaniel HesslowLucile SaulnierStas BekmanM. Saiful BariStella BidermanHady ElsaharNiklas MuennighoffJason PhangOfir PressColin RaffelVictor SanhSheng ShenLintang SutawikaJaesung TaeZheng Xin YongJulien LaunayIz BeltagyPublished in: CoRR (2022)
Keyphrases
- language model
- language modeling
- n gram
- speech recognition
- document retrieval
- information retrieval
- probabilistic model
- test collection
- retrieval model
- query expansion
- language modelling
- statistical language models
- ad hoc information retrieval
- smoothing methods
- mixture model
- query terms
- pseudo relevance feedback
- translation model
- language model for information retrieval
- relevance model
- context sensitive
- document length
- multiword
- word error rate
- language modeling framework
- probability distribution
- word clouds
- hidden markov models