MegaScale: Scaling Large Language Model Training to More Than 10, 000 GPUs.
Ziheng JiangHaibin LinYinmin ZhongQi HuangYangrui ChenZhi ZhangYanghua PengXiang LiCong XieShibiao NongYulu JiaSun HeHongmin ChenZhihao BaiQi HouShipeng YanDing ZhouYiyao ShengZhuo JiangHaohan XuHaoran WeiZhang ZhangPengfei NieLeqi ZouSida ZhaoLiang XiangZherui LiuZhe LiXiaoying JiaJianxi YeXin JinXin LiuPublished in: NSDI (2024)
Keyphrases
- language model
- language modeling
- n gram
- probabilistic model
- speech recognition
- document retrieval
- information retrieval
- query expansion
- test collection
- retrieval model
- context sensitive
- language modelling
- mixture model
- training set
- ad hoc information retrieval
- vector space model
- language model for information retrieval
- statistical language models
- smoothing methods
- pseudo relevance feedback
- translation model
- relevance model
- bayesian networks
- statistical machine translation
- document ranking
- document length
- query terms