Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness.
Eric ZelikmanQian HuangPercy LiangNick HaberNoah D. GoodmanPublished in: CoRR (2023)
Keyphrases
- language model
- low bandwidth
- language modeling
- wireless networks
- mobile devices
- mobile computing
- wireless communication
- high bandwidth
- n gram
- information retrieval
- probabilistic model
- retrieval model
- video conferencing
- computing environments
- database server
- communication cost
- query expansion
- distributed systems
- mixture model
- context sensitive
- ad hoc information retrieval
- peer to peer
- database applications
- mobile networks
- translation model
- high dimensional
- data analysis
- smoothing methods
- machine learning