RDMA over Ethernet for Distributed Training at Meta Scale.
Adithya GangidiRui MiaoShengbao ZhengSai Jayesh BonduGuilherme GoesHany MorsyRohit PuriMohammad RiftadiAshmitha Jeevaraj ShettyJingyi YangShuqiang ZhangMikel Jimenez FernandezShashidhar GandhamHongyi ZengPublished in: SIGCOMM (2024)
Keyphrases
- wide area network
- distributed systems
- local area network
- tcp ip
- multi agent
- training set
- neural network
- cooperative
- distributed data
- training samples
- training examples
- test set
- training process
- distributed environment
- computer networks
- meta level
- real time
- communication cost
- fault tolerant
- supervised learning
- communication networks
- small scale
- mobile agents
- online learning
- high speed
- distributed processing
- machine learning
- distributed network
- database