Login / Signup

SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient.

Max RyabininTim DettmersMichael DiskinAlexander Borzunov
Published in: CoRR (2023)
Keyphrases