​
Login / Signup
Wei Wu
ORCID
Publication Activity (10 Years)
Years Active: 2009-2022
Publications (10 Years): 19
Top Topics
Benchmark Suite
Memory Management
Directed Acyclic Graph
Neural Network
Top Venues
CoRR
HPDC
IEEE Trans. Parallel Distributed Syst.
CLUSTER
</>
Publications
</>
Qinglei Cao
,
George Bosilca
,
Nuria Losada
,
Wei Wu
,
Dong Zhong
,
Jack J. Dongarra
Evaluating Data Redistribution in PaRSEC.
IEEE Trans. Parallel Distributed Syst.
33 (8) (2022)
Colin Unger
,
Zhihao Jia
,
Wei Wu
,
Sina Lin
,
Mandeep Baines
,
Carlos Efrain Quintero Narvaez
,
Vinay Ramakrishnaiah
,
Nirmal Prajapati
,
Patrick S. McCormick
,
Jamaludin Mohd-Yusof
,
Xi Luo
,
Dheevatsa Mudigere
,
Jongsoo Park
,
Misha Smelyanskiy
,
Alex Aiken
Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization.
OSDI
(2022)
Tong Geng
,
Ang Li
,
Tianqi Wang
,
Chunshu Wu
,
Yanfei Li
,
Runbin Shi
,
Wei Wu
,
Martin C. Herbordt
O3BNN-R: An Out-of-Order Architecture for High-Performance and Regularized BNN Inference.
IEEE Trans. Parallel Distributed Syst.
32 (1) (2021)
Xi Luo
,
Wei Wu
,
George Bosilca
,
Yu Pei
,
Qinglei Cao
,
Thananon Patinyasakdikul
,
Dong Zhong
,
Jack J. Dongarra
HAN: a Hierarchical AutotuNed Collective Communication Framework.
CLUSTER
(2020)
Qinglei Cao
,
George Bosilca
,
Wei Wu
,
Dong Zhong
,
Aurelien Bouteiller
,
Jack J. Dongarra
Flexible Data Redistribution in a Task-Based Runtime System.
CLUSTER
(2020)
Elliott Slaughter
,
Wei Wu
,
Yuankun Fu
,
Legend Brandenburg
,
Nicolai Garcia
,
Wilhem Kautz
,
Emily Marx
,
Kaleb S. Morris
,
Qinglei Cao
,
George Bosilca
,
Seema Mirchandaney
,
Wonchan Lee
,
Sean Treichler
,
Patrick S. McCormick
,
Alex Aiken
Task bench: a parameterized benchmark for evaluating parallel runtime performance.
SC
(2020)
Linnan Wang
,
Wei Wu
,
Junyu Zhang
,
Hang Liu
,
George Bosilca
,
Maurice Herlihy
,
Rodrigo Fonseca
FFT-based Gradient Sparsification for the Distributed Training of Deep Neural Networks.
HPDC
(2020)
Tong Geng
,
Tianqi Wang
,
Chunshu Wu
,
Chen Yang
,
Wei Wu
,
Ang Li
,
Martin C. Herbordt
O3BNN: an out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning.
ICS
(2019)
Elliott Slaughter
,
Wei Wu
,
Yuankun Fu
,
Legend Brandenburg
,
Nicolai Garcia
,
Wilhem Kautz
,
Emily Marx
,
Kaleb S. Morris
,
Wonchan Lee
,
Qinglei Cao
,
George Bosilca
,
Seema Mirchandaney
,
Sean Treichler
,
Patrick S. McCormick
,
Alex Aiken
Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance.
CoRR
(2019)
Linnan Wang
,
Wei Wu
,
Yiyang Zhao
,
Junyu Zhang
,
Hang Liu
,
George Bosilca
,
Jack J. Dongarra
,
Maurice Herlihy
,
Rodrigo Fonseca
SuperNeurons: FFT-based Gradient Sparsification in the Distributed Training of Deep Neural Networks.
CoRR
(2018)
Xi Luo
,
Wei Wu
,
George Bosilca
,
Thananon Patinyasakdikul
,
Linnan Wang
,
Jack J. Dongarra
ADAPT: an event-based adaptive collective communication framework.
HPDC
(2018)
Linnan Wang
,
Jinmian Ye
,
Yiyang Zhao
,
Wei Wu
,
Ang Li
,
Shuaiwen Leon Song
,
Zenglin Xu
,
Tim Kraska
SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks.
CoRR
(2018)
Linnan Wang
,
Jinmian Ye
,
Yiyang Zhao
,
Wei Wu
,
Ang Li
,
Shuaiwen Leon Song
,
Zenglin Xu
,
Tim Kraska
Superneurons: dynamic GPU memory management for training deep neural networks.
PPOPP
(2018)
Dali Wang
,
Yu Pei
,
Oscar R. Hernandez
,
Wei Wu
,
Zhuo Yao
,
Youngsung Kim
,
Michael Wolfe
,
Ryan Kitchen
Compiler technologies for understanding legacy scientific code: A case study on an ACME land module.
ICCS
(2017)
Yiyang Zhao
,
Linnan Wang
,
Wei Wu
,
George Bosilca
,
Richard W. Vuduc
,
Jinmian Ye
,
Wenqi Tang
,
Zenglin Xu
Efficient Communications in Training Large Scale Neural Networks.
ACM Multimedia (Thematic Workshops)
(2017)
Yang Xu
,
Dali Wang
,
Tomislav Janjusic
,
Wei Wu
,
Yu Pei
,
Zhuo Yao
A Web-based Visual Analytic Framework for Understanding Large-scale Environmental Models: A Use Case for The Community Land Model.
ICCS
(2017)
Wei Wu
,
George Bosilca
,
Rolf Vandevaart
,
Sylvain Jeaugey
,
Jack J. Dongarra
GPU-Aware Non-contiguous Data Movement In Open MPI.
HPDC
(2016)
Linnan Wang
,
Wei Wu
,
Zenglin Xu
,
Jianxiong Xiao
,
Yi Yang
BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing.
ICS
(2016)
Sooraj Puthoor
,
Ashwin M. Aji
,
Shuai Che
,
Mayank Daga
,
Wei Wu
,
Bradford M. Beckmann
,
Gregory Rodgers
Implementing directed acyclic graphs with the heterogeneous system architecture.
GPGPU@PPoPP
(2016)
Linnan Wang
,
Wei Wu
,
Jianxiong Xiao
,
Yi Yang
BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing.
CoRR
(2015)
Linnan Wang
,
Wei Wu
,
Jianxiong Xiao
,
Yang Yi
Large Scale Artificial Neural Network Training Using Multi-GPUs.
CoRR
(2015)
Wei Wu
,
Aurélien Bouteiller
,
George Bosilca
,
Mathieu Faverge
,
Jack J. Dongarra
Hierarchical DAG Scheduling for Hybrid Distributed Systems.
IPDPS
(2015)
Dali Wang
,
Tomislav Janjusic
,
Colleen Iversen
,
Peter E. Thornton
,
Misha Karssovski
,
Wei Wu
,
Yang Xu
A Scientific Function Test Framework for Modular Environmental Model Development: Application to the Community Land Model.
SE4HPCS@ICSE
(2015)
Xiaoli Yang
,
Wei Wu
,
Charles C. Tseng
Algorithms for modeling structural changes in human chromosomes.
Comput. Methods Programs Biomed.
110 (2) (2013)
Xiaoli Yang
,
Wei Wu
,
Ding Wen
,
Bin Chen
,
Jason Lacny
,
Charles Tseng
Virtual chromosome modeling for learning human cytogenetics.
ICCA
(2010)
Wei Wu
,
Xiaoli Yang
,
Bin Chen
,
Zhenpeng Zhao
,
Jason Lacny
,
Charles Tseng
Computer Based Simulation of Chromosome Abnormality.
BIOCOMP
(2009)