Sign in
Peng Sun
ORCID
Publication Activity (10 Years)
Years Active: 2013-2024
Publications (10 Years): 31
Top Topics
Parameter Tuning
Multi Tenant
Single Commodity
Deep Learning
Top Venues
CoRR
IEEE Trans. Big Data
ICPADS
SoCC
</>
Publications
</>
Qiaoling Chen
,
Diandian Gu
,
Guoteng Wang
,
Xun Chen
,
YingTong Xiong
,
Ting Huang
,
Qinghao Hu
,
Xin Jin
,
Yonggang Wen
,
Tianwei Zhang
,
Peng Sun
InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding.
CoRR
(2024)
Qinghao Hu
,
Meng Zhang
,
Peng Sun
,
Yonggang Wen
,
Tianwei Zhang
Lucid: A Non-intrusive, Scalable and Interpretable Scheduler for Deep Learning Training Jobs.
ASPLOS (2)
(2023)
Qiaoling Chen
,
Qinghao Hu
,
Zhisheng Ye
,
Guoteng Wang
,
Peng Sun
,
Yonggang Wen
,
Tianwei Zhang
AMSP: Super-Scaling LLM Training via Advanced Model States Partitioning.
CoRR
(2023)
Qinghao Hu
,
Zhisheng Ye
,
Meng Zhang
,
Qiaoling Chen
,
Peng Sun
,
Yonggang Wen
,
Tianwei Zhang
Hydro: Surrogate-Based Hyperparameter Tuning Service in Datacenters.
OSDI
(2023)
Meng Zhang
,
Qinghao Hu
,
Peng Sun
,
Yonggang Wen
,
Tianwei Zhang
Boosting Distributed Full-graph GNN Training with Asynchronous One-bit Communication.
CoRR
(2023)
Peng Sun
,
Yonggang Wen
,
Ruobing Han
,
Wansen Feng
,
Shengen Yan
GradientFlow: Optimizing Network Performance for Large-Scale Distributed DNN Training.
IEEE Trans. Big Data
8 (2) (2022)
Ruofan Liang
,
Bingsheng He
,
Shengen Yan
,
Peng Sun
A Simulation Platform for Multi-tenant Machine Learning Services on Thousands of GPUs.
CoRR
(2022)
Wei Gao
,
Qinghao Hu
,
Zhisheng Ye
,
Peng Sun
,
Xiaolin Wang
,
Yingwei Luo
,
Tianwei Zhang
,
Yonggang Wen
Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision.
CoRR
(2022)
Zhisheng Ye
,
Peng Sun
,
Wei Gao
,
Tianwei Zhang
,
Xiaolin Wang
,
Shengen Yan
,
Yingwei Luo
Astraea: A Fair Deep Learning Scheduler for Multi-Tenant GPU Clusters.
IEEE Trans. Parallel Distributed Syst.
33 (11) (2022)
Wei Gao
,
Peng Sun
,
Yonggang Wen
,
Tianwei Zhang
Titan: a scheduler for foundation model fine-tuning workloads.
SoCC
(2022)
Qinghao Hu
,
Harsha Nori
,
Peng Sun
,
Yonggang Wen
,
Tianwei Zhang
Primo: Practical Learning-Augmented Systems with Interpretable Models.
USENIX Annual Technical Conference
(2022)
Wei Gao
,
Zhisheng Ye
,
Peng Sun
,
Yonggang Wen
,
Tianwei Zhang
Chronus: A Novel Deadline-aware Scheduler for Deep Learning Training Jobs.
SoCC
(2021)
Qinghao Hu
,
Peng Sun
,
Shengen Yan
,
Yonggang Wen
,
Tianwei Zhang
Characterization and Prediction of Deep Learning Workloads in Large-Scale GPU Datacenters.
CoRR
(2021)
Yizheng Huang
,
Huaizheng Zhang
,
Yonggang Wen
,
Peng Sun
,
Nguyen Binh Duong Ta
ModelCI-e: Enabling Continual Learning in Deep Learning Serving Systems.
CoRR
(2021)
Qinghao Hu
,
Peng Sun
,
Shengen Yan
,
Yonggang Wen
,
Tianwei Zhang
Characterization and prediction of deep learning workloads in large-scale GPU datacenters.
SC
(2021)
Lei Xie
,
Jidong Zhai
,
Baodong Wu
,
Yuanbo Wang
,
Xingcheng Zhang
,
Peng Sun
,
Shengen Yan
Elan: Towards Generic and Efficient Elastic Training for Deep Learning.
ICDCS
(2020)
Peng Sun
,
Yonggang Wen
,
Ta Nguyen Binh Duong
,
Xiaokui Xiao
GraphMP: I/O-Efficient Big Graph Analytics on a Single Commodity Machine.
IEEE Trans. Big Data
6 (4) (2020)
Peng Sun
,
Wansen Feng
,
Ruobing Han
,
Shengen Yan
,
Yonggang Wen
Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes.
CoRR
(2019)
Zhenzhen Hu
,
Peng Sun
,
Yonggang Wen
Speeding-Up Age Estimation in Intelligent Demographics System via Network Optimization.
ICC
(2018)
Wuqiong Luo
,
Wee Peng Tay
,
Peng Sun
,
Yonggang Wen
On Distributed Algorithms for Cost-Efficient Data Center Placement in Cloud Computing.
CoRR
(2018)
Peng Sun
,
Yonggang Wen
,
Ta Nguyen Binh Duong
,
Xiaokui Xiao
GraphMP: I/O-Efficient Big Graph Analytics on a Single Commodity Machine.
CoRR
(2018)
Peng Sun
,
Yonggang Wen
,
Duong Nguyen Binh Ta
,
Haiyong Xie
MetaFlow: A Scalable Metadata Lookup Service for Distributed File Systems in Data Centers.
IEEE Trans. Big Data
4 (2) (2018)
Zhenzhen Hu
,
Peng Sun
,
Yonggang Wen
Speeding-up Age Estimation in Intelligent Demographics System via Network Optimization.
CoRR
(2018)
Peng Sun
,
Yonggang Wen
,
Ta Nguyen Binh Duong
,
Shengen Yan
Towards Distributed Machine Learning in Shared Clusters: A Dynamically-Partitioned Approach.
CoRR
(2017)
Peng Sun
,
Yonggang Wen
,
Ta Nguyen Binh Duong
,
Shengen Yan
Towards Distributed Machine Learning in Shared Clusters: A Dynamically-Partitioned Approach.
SMARTCOMP
(2017)
Peng Sun
,
Yonggang Wen
,
Ta Nguyen Binh Duong
,
Xiaokui Xiao
GraphH: High Performance Big Graph Analytics in Small Clusters.
CLUSTER
(2017)
Peng Sun
,
Yonggang Wen
,
Ta Nguyen Binh Duong
,
Xiaokui Xiao
GraphH: High Performance Big Graph Analytics in Small Clusters.
CoRR
(2017)
Peng Sun
,
Yonggang Wen
,
Ta Nguyen Binh Duong
,
Xiaokui Xiao
GraphMP: An Efficient Semi-External-Memory Big Graph Processing System on a Single Machine.
ICPADS
(2017)
Peng Sun
,
Yonggang Wen
,
Ta Nguyen Binh Duong
,
Xiaokui Xiao
GraphMP: An Efficient Semi-External-Memory Big Graph Processing System on a Single Machine.
CoRR
(2017)
Peng Sun
,
Yonggang Wen
,
Ta Nguyen Binh Duong
,
Haiyong Xie
MetaFlow: a Scalable Metadata Lookup Service for Distributed File Systems in Data Centers.
CoRR
(2016)
Peng Sun
,
Yonggang Wen
,
Ta Nguyen Binh Duong
,
Shengen Yan
Timed Dataflow: Reducing Communication Overhead for Distributed Machine Learning Systems.
ICPADS
(2016)
Zhiming Hu
,
Yan Qiao
,
Jun Luo
,
Peng Sun
,
Yonggang Wen
CREATE: Correlation enhanced traffic matrix estimation in Data Center Networks.
Networking
(2014)
Jianxiong Yin
,
Peng Sun
,
Yonggang Wen
,
Haigang Gong
,
Ming Liu
,
Xuelong Li
,
Haipeng You
,
Jinqi Gao
,
Cynthia Lin
Cloud3DView: an interactive tool for cloud data center operations.
SIGCOMM
(2013)