Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach.
Jianbo DongBin LuoJun ZhangPengcheng ZhangFei FengYikai ZhuAng LiuZian ChenYi ShiHairong JiaoGang LuYu GuanEnnan ZhaiWencong XiaoHanyu ZhaoMan YuanSiran YangXiang LiJiamang WangRui MenJianwei ZhangHuang ZhongDennis CaiYuan XieBinzhang FuPublished in: CoRR (2024)
Keyphrases
- training set
- early stopping
- parallel implementation
- small scale
- real world
- training phase
- parallel processing
- real life
- decision trees
- hidden markov models
- test set
- computational complexity
- communication networks
- data sets
- weighted sums
- training error
- parallel execution
- distributed memory
- computer architecture
- object detectors
- hearing impaired
- weak classifiers
- communication cost
- ensemble learning
- ensemble methods
- wireless networks
- face detection
- data driven
- object detection
- learning algorithm