An empirical analysis of compute-optimal large language model training.
Jordan HoffmannSebastian BorgeaudArthur MenschElena BuchatskayaTrevor CaiEliza RutherfordDiego de Las CasasLisa Anne HendricksJohannes WelblAidan ClarkTom HenniganEric NolandKatherine MillicanGeorge van den DriesscheBogdan DamocAurelia GuySimon OsinderoKaren SimonyanErich ElsenOriol VinyalsJack W. RaeLaurent SifrePublished in: NeurIPS (2022)
Keyphrases
- language model
- language modeling
- n gram
- document retrieval
- probabilistic model
- mixture model
- speech recognition
- retrieval model
- information retrieval
- query expansion
- language modelling
- statistical language models
- ad hoc information retrieval
- context sensitive
- training set
- pseudo relevance feedback
- smoothing methods
- vector space model
- test collection
- language model for information retrieval
- translation model
- statistical machine translation
- document ranking
- query specific
- semi supervised
- word clouds