MegaScale: Scaling Large Language Model Training to More Than 10, 000 GPUs.
Ziheng JiangHaibin LinYinmin ZhongQi HuangYangrui ChenZhi ZhangYanghua PengXiang LiCong XieShibiao NongYulu JiaSun HeHongmin ChenZhihao BaiQi HouShipeng YanDing ZhouYiyao ShengZhuo JiangHaohan XuHaoran WeiZhang ZhangPengfei NieLeqi ZouSida ZhaoLiang XiangZherui LiuZhe LiXiaoying JiaJianxi YeXin JinXin LiuPublished in: CoRR (2024)
Keyphrases
- language model
- language modeling
- retrieval model
- document retrieval
- n gram
- information retrieval
- probabilistic model
- query expansion
- test collection
- language modelling
- speech recognition
- mixture model
- statistical language models
- language model for information retrieval
- query terms
- ad hoc information retrieval
- document ranking
- vector space model
- pseudo relevance feedback
- translation model
- context sensitive
- smoothing methods
- relevance model
- web search
- word error rate
- relevance feedback
- probability distribution
- feature selection