Understanding the Effectiveness of Early Weight Averaging for Training Large Language Models.
Sunny SanyalJean KaddourAbhishek KumarSujay SanghaviPublished in: CoRR (2023)
Keyphrases
- language model
- language modeling
- n gram
- speech recognition
- probabilistic model
- document retrieval
- statistical language models
- language modelling
- context sensitive
- retrieval model
- query expansion
- document ranking
- language models for information retrieval
- ad hoc information retrieval
- information retrieval
- test collection
- vector space model
- training set
- language model for information retrieval
- smoothing methods
- translation model
- query terms
- relevance model
- term dependencies
- weighting scheme
- automatic speech recognition
- document length
- relevant documents
- relevance feedback
- decision trees