ACCL: Architecting Highly Scalable Distributed Training Systems With Highly Efficient Collective Communication Library.
Jianbo DongShaochuang WangFei FengZheng CaoHeng PanLingbo TangPengcheng LiHao LiQianyuan RanYiqun GuoShanyuan GaoXin LongJie ZhangYong LiZhisheng XiaLiuyihan SongYingya ZhangPan PanGuohui WangXiaowei JiangPublished in: IEEE Micro (2021)
Keyphrases
- highly efficient
- highly scalable
- distributed systems
- communication overhead
- distributed computation
- open systems
- distributed network
- low cost
- communication cost
- single point of failure
- multimedia communication
- multi agent
- fully distributed
- web caching
- global knowledge
- data partitioning
- computer networks
- computer systems
- real time