Reliable and Efficient In-Memory Fault Tolerance of Large Language Model Pretraining.
Yuxin WangShaohuai ShiXin HeZhenheng TangXinglin PanYang ZhengXiaoyu WuAmelie Chi ZhouBingsheng HeXiaowen ChuPublished in: CoRR (2023)
Keyphrases
- fault tolerance
- language model
- fault tolerant
- language modeling
- document retrieval
- load balancing
- distributed systems
- n gram
- probabilistic model
- response time
- speech recognition
- database replication
- mixture model
- mobile agents
- test collection
- context sensitive
- statistical language models
- retrieval model
- peer to peer
- query terms
- information retrieval
- digital libraries
- data streams
- ad hoc information retrieval
- relevance model
- group communication
- smoothing methods
- language model for information retrieval
- fault management
- language modelling
- database systems
- clustering algorithm