Login / Signup
Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates.
Insu Jang
Zhenning Yang
Zhen Zhang
Xin Jin
Mosharaf Chowdhury
Published in:
CoRR (2023)
Keyphrases
</>
probabilistic model
statistical models
training phase
distributed environment
online learning
parametric models
training process
computational models
fault tolerant
parameter estimation
database
peer to peer
distributed systems
prior knowledge
training set
cooperative
database systems
machine learning