Login / Signup
Tao Tang
ORCID
Publication Activity (10 Years)
Years Active: 2013-2024
Publications (10 Years): 33
Top Topics
Matrix Factorization
Machine Learning
Heterogeneous Platforms
Intel Xeon
Top Venues
CoRR
IPDPS Workshops
CCF Trans. High Perform. Comput.
IPDPS
</>
Publications
</>
Tao Tang
,
Kai Lu
,
Lin Peng
,
Yingbo Cui
,
Jianbin Fang
,
Chun Huang
,
Ruibo Wang
,
Canqun Yang
,
Yifei Guo
SNCL: a supernode OpenCL implementation for hybrid computing arrays.
J. Supercomput.
80 (7) (2024)
Fugeng Zhu
,
Xinxin Qi
,
Peng Zhang
,
Jianbin Fang
,
Tao Tang
,
Yonggang Che
,
Kainan Yu
,
Jing Xie
,
Chun Huang
,
Jie Ren
Optimizing Stencil Computation on Multi-core DSPs.
ICPP
(2024)
Kainan Yu
,
Xinxin Qi
,
Peng Zhang
,
Jianbin Fang
,
Dezun Dong
,
Ruibo Wang
,
Tao Tang
,
Chun Huang
,
Yonggang Che
,
Zheng Wang
Optimizing General Matrix Multiplications on Modern Multi-core DSPs.
IPDPS
(2024)
Jianbin Fang
,
Peng Zhang
,
Chun Huang
,
Tao Tang
,
Kai Lu
,
Ruibo Wang
,
Zheng Wang
Programming bare-metal accelerators with heterogeneous threading models: a case study of Matrix-3000.
Frontiers Inf. Technol. Electron. Eng.
24 (4) (2023)
Pengyu Wang
,
Weiling Yang
,
Jianbin Fang
,
Dezun Dong
,
Chun Huang
,
Peng Zhang
,
Tao Tang
,
Zheng Wang
Optimizing Direct Convolutions on ARM Multi-Cores.
SC
(2023)
Kai Lu
,
Yaohua Wang
,
Yang Guo
,
Chun Huang
,
Sheng Liu
,
Ruibo Wang
,
Jianbin Fang
,
Tao Tang
,
Zhaoyun Chen
,
Biwei Liu
,
Zhong Liu
,
Yuanwu Lei
,
Haiyan Sun
MT-3000: a heterogeneous multi-zone processor for HPC.
CCF Trans. High Perform. Comput.
4 (2) (2022)
Yingbo Cui
,
Zihang Wang
,
Johannes Köster
,
Xiangke Liao
,
Shaoliang Peng
,
Tao Tang
,
Chun Huang
,
Canqun Yang
VISPR-online: a web-based interactive tool to visualize CRISPR screening experiments.
BMC Bioinform.
22 (1) (2021)
Zeyu Xia
,
Yingbo Cui
,
Ang Zhang
,
Peng Zhang
,
Sifan Long
,
Tao Tang
,
Lin Peng
,
Chun Huang
,
Canqun Yang
,
Xiangke Liao
Large-Scale Parallel Alignment Algorithm for SMRT Reads.
ICA3PP (2)
(2021)
Jing Chen
,
Jianbin Fang
,
Weifeng Liu
,
Tao Tang
,
Canqun Yang
clMF: A fine-grained and portable alternating least squares algorithm for parallel matrix factorization.
Future Gener. Comput. Syst.
108 (2020)
Peng Zhang
,
Jianbin Fang
,
Canqun Yang
,
Chun Huang
,
Tao Tang
,
Zheng Wang
Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures.
IEEE Trans. Parallel Distributed Syst.
31 (8) (2020)
Jianbin Fang
,
Chun Huang
,
Tao Tang
,
Zheng Wang
Parallel programming models for heterogeneous many-cores: a comprehensive survey.
CCF Trans. High Perform. Comput.
2 (4) (2020)
Peng Zhang
,
Jianbin Fang
,
Canqun Yang
,
Chun Huang
,
Tao Tang
,
Zheng Wang
Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures: A Machine Learning Based Approach.
CoRR
(2020)
Jianbin Fang
,
Chun Huang
,
Tao Tang
,
Zheng Wang
Parallel Programming Models for Heterogeneous Many-Cores : A Survey.
CoRR
(2020)
Wenxu Zheng
,
Jianbin Fang
,
Chen Juan
,
Feihao Wu
,
Xiaodong Pan
,
Hao Wang
,
Xiaole Sun
,
Yuan Yuan
,
Min Xie
,
Chun Huang
,
Tao Tang
,
Zheng Wang
Auto-Tuning MPI Collective Operations on Large-Scale Parallel Systems.
HPCC/SmartCity/DSS
(2019)
Peng Zhang
,
Tao Tang
,
Jianbin Fang
,
Chun Huang
,
Canqun Yang
,
Zheng Wang
MOCL: an efficient openCL implementation for the matrix-2000 architecture.
CF
(2018)
Peng Zhang
,
Jianbin Fang
,
Tao Tang
,
Canqun Yang
,
Zheng Wang
Tuning Streamed Applications on Intel Xeon Phi: A Machine Learning Based Approach.
CoRR
(2018)
Peng Zhang
,
Jianbin Fang
,
Tao Tang
,
Canqun Yang
,
Zheng Wang
Auto-tuning Streamed Applications on Intel Xeon Phi.
IPDPS
(2018)
Xuhao Chen
,
Cheng Chen
,
Jie Shen
,
Jianbin Fang
,
Tao Tang
,
Canqun Yang
,
Zhiying Wang
Orchestrating parallel detection of strongly connected components on GPUs.
Parallel Comput.
78 (2018)
Jing Chen
,
Jianbin Fang
,
Weifeng Liu
,
Tao Tang
,
Xuhao Chen
,
Canqun Yang
Efficient and Portable ALS Matrix Factorization for Recommender Systems.
IPDPS Workshops
(2017)
Xuhao Chen
,
Pingfan Li
,
Jianbin Fang
,
Tao Tang
,
Zhiying Wang
,
Canqun Yang
Efficient and high-quality sparse graph coloring on GPUs.
Concurr. Comput. Pract. Exp.
29 (10) (2017)
Jing Chen
,
Jianbin Fang
,
Tao Tang
,
Canqun Yang
多核/众核平台上推荐算法的实现与性能评估 (Implementation and Performance Evaluation of Recommender Algorithms Based on Multi-/Many-core Platforms).
计算机科学
44 (10) (2017)
Pingfan Li
,
Xuhao Chen
,
Jie Shen
,
Jianbin Fang
,
Tao Tang
,
Canqun Yang
High Performance Detection of Strongly Connected Components in Sparse Graphs on GPUs.
PMAM@PPoPP
(2017)
Tao Tang
,
Lin Peng
,
Chun Huang
,
Canqun Yang
面向存储层次设计优化的GPU程序性能分析 (Performance Analysis of GPU Programs Towards Better Memory Hierarchy Design).
计算机科学
44 (12) (2017)
Xi Yang
,
Jianbin Fang
,
Jing Chen
,
Chengkun Wu
,
Tao Tang
,
Kai Lu
High Performance Coordinate Descent Matrix Factorization for Recommender Systems.
Conf. Computing Frontiers
(2017)
Cheng Chen
,
Jianbin Fang
,
Tao Tang
,
Canqun Yang
LU factorization on heterogeneous systems: an energy-efficient approach towards high performance.
Computing
99 (8) (2017)
Jianbin Fang
,
Peng Zhang
,
Tao Tang
,
Chun Huang
,
Canqun Yang
Implementing and Evaluating OpenCL on an ARMv8 Multi-Core CPU.
ISPA/IUCC
(2017)
Zhaokui Li
,
Jianbin Fang
,
Tao Tang
,
Xuhao Chen
,
Canqun Yang
Streaming Applications on Heterogeneous Platforms.
NPC
(2016)
Zhaokui Li
,
Jianbin Fang
,
Tao Tang
,
Xuhao Chen
,
Cheng Chen
,
Canqun Yang
Evaluating the Performance Impact of Multiple Streams on the MIC-based Heterogeneous Platform.
CoRR
(2016)
Zhaokui Li
,
Jianbin Fang
,
Tao Tang
,
Xuhao Chen
,
Canqun Yang
Streaming Applications on Heterogeneous Platforms.
CoRR
(2016)
Canqun Yang
,
Cheng Chen
,
Tao Tang
,
Xuhao Chen
,
Jianbin Fang
,
Jingling Xue
An Energy-Efficient Implementation of LU Factorization on Heterogeneous Systems.
ICPADS
(2016)
Jianbin Fang
,
Peng Zhang
,
Zhaokui Li
,
Tao Tang
,
Xuhao Chen
,
Cheng Chen
,
Canqun Yang
Evaluating Multiple Streams on Heterogeneous Platforms.
Parallel Process. Lett.
26 (4) (2016)
Pingfan Li
,
Xuhao Chen
,
Zhe Quan
,
Jianbin Fang
,
Huayou Su
,
Tao Tang
,
Canqun Yang
High Performance Parallel Graph Coloring on GPGPUs.
IPDPS Workshops
(2016)
Zhaokui Li
,
Jianbin Fang
,
Tao Tang
,
Xuhao Chen
,
Cheng Chen
,
Canqun Yang
Evaluating the Performance Impact of Multiple Streams on the MIC-Based Heterogeneous Platform.
IPDPS Workshops
(2016)
Xiangke Liao
,
Canqun Yang
,
Zhe Quan
,
Tao Tang
,
Cheng Chen
An Efficient Clique-Based Algorithm of Compute Nodes Allocation for In-memory Checkpoint System.
ISC
(2015)
Xiangke Liao
,
Can-Qun Yung
,
Tao Tang
,
Huizhan Yi
,
Feng Wang
,
Qiang Wu
,
Jingling Xue
OpenMC: Towards Simplifying Programming for TianHe Supercomputers.
J. Comput. Sci. Technol.
29 (3) (2014)
Cheng Chen
,
Canqun Yang
,
Tao Tang
,
Qiang Wu
,
Pengfei Zhang
OpenACC to Intel Offload: Automatic Translation and Optimization.
NCCET
(2013)
Qiang Wu
,
Canqun Yang
,
Tao Tang
,
Liquan Xiao
MIC acceleration of short-range molecular dynamics simulations.
COSMIC@CGO
(2013)
Qiang Wu
,
Canqun Yang
,
Tao Tang
,
Liquan Xiao
Exploiting hierarchy parallelism for molecular dynamics on a petascale heterogeneous system.
J. Parallel Distributed Comput.
73 (12) (2013)