Sign in

Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness.

Eric ZelikmanQian HuangPercy LiangNick HaberNoah D. Goodman
Published in: CoRR (2023)
Keyphrases