Login / Signup
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient.
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
Published in:
ICML (2023)
Keyphrases
</>
statistical models
feature space
parameter estimation
feature selection
training data
cooperative
training set
probabilistic model
particle swarm optimization
training samples
communication systems
training process
structured prediction