Login / Signup
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient.
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
Published in:
CoRR (2023)
Keyphrases
</>
experimental data
statistical models
training set
model selection
feature selection
probabilistic model
parallel architectures
multi agent
supervised learning
online learning
statistical model
swarm intelligence
computational models
communication systems
parametric models