​
Login / Signup
Dong Dai
ORCID
Publication Activity (10 Years)
Years Active: 2012-2024
Publications (10 Years): 51
Top Topics
Anomaly Detection
Reinforcement Learning
Prediction Scheme
File System
Top Venues
CoRR
HPDC
IPDPS
SC
</>
Publications
</>
Di Zhang
,
Monish Soundar Raj
,
Bing Xie
,
Sheng Di
,
Dong Dai
Cross-System Analysis of Job Characterization and Scheduling in Large-Scale Computing Clusters.
IPDPS
(2024)
Runzhou Han
,
Mai Zheng
,
Suren Byna
,
Houjun Tang
,
Bin Dong
,
Dong Dai
,
Yong Chen
,
Dongkyun Kim
,
Joseph Hassoun
,
David Thorsley
PROV-IO$^+$+: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems.
IEEE Trans. Parallel Distributed Syst.
35 (5) (2024)
Elliot Kolker-Hicks
,
Di Zhang
,
Dong Dai
A Reinforcement Learning Based Backfilling Strategy for HPC Batch Jobs.
CoRR
(2024)
Chris Egersdoerfer
,
Arnav Sareen
,
Jean Luca Bez
,
Suren Byna
,
Dong Dai
ION: Navigating the HPC I/O Optimization Journey using Large Language Models.
HotStorage
(2024)
Abdullah Al Raqibul Islam
,
Dong Dai
DGAP: Efficient Dynamic Graph Analysis on Persistent Memory.
CoRR
(2024)
Runzhou Han
,
Mai Zheng
,
Suren Byna
,
Houjun Tang
,
Bin Dong
,
Dong Dai
,
Yong Chen
,
Dongkyun Kim
,
Joseph Hassoun
,
David Thorsley
,
Matthew Wolf
PROV-IO+: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems.
CoRR
(2023)
Elliot Kolker-Hicks
,
Di Zhang
,
Dong Dai
A Reinforcement Learning Based Backfilling Strategy for HPC Batch Jobs.
SC Workshops
(2023)
Chris Egersdoerfer
,
Dong Dai
,
Di Zhang
ClusterLog: Clustering Logs for Effective Log-based Anomaly Detection.
CoRR
(2023)
Saisha Kamat
,
Abdullah Al Raqibul Islam
,
Mai Zheng
,
Dong Dai
FaultyRank: A Graph-based Parallel File System Checker.
IPDPS
(2023)
Di Zhang
,
Chris Egersdoerfer
,
Tabassum Mahmud
,
Mai Zheng
,
Dong Dai
Drill: Log-based Anomaly Detection for Large-scale Storage Systems Using Source Code Analysis.
IPDPS
(2023)
Chris Egersdoerfer
,
Di Zhang
,
Dong Dai
Early Exploration of Using ChatGPT for Log-based Anomaly Detection on Parallel File Systems Logs.
HPDC
(2023)
Dazhao Cheng
,
Yu Wang
,
Dong Dai
Dynamic Resource Provisioning for Iterative Workloads on Apache Spark.
IEEE Trans. Cloud Comput.
11 (1) (2023)
Md. Hasanur Rashid
,
Youbiao He
,
Forrest Sheng Bao
,
Dong Dai
IOPathTune: Adaptive Online Parameter Tuning for Parallel File System I/O Path.
CoRR
(2023)
Abdullah Al Raqibul Islam
,
Dong Dai
DGAP: Efficient Dynamic Graph Analysis on Persistent Memory.
SC
(2023)
Abdullah Al Raqibul Islam
,
Christopher York
,
Dong Dai
A performance study of optane persistent memory: from storage data structures' perspective.
CCF Trans. High Perform. Comput.
4 (4) (2022)
Runzhou Han
,
Om Rameshwar Gatla
,
Mai Zheng
,
Jinrui Cao
,
Di Zhang
,
Dong Dai
,
Yong Chen
,
Jonathan E. Cook
A Study of Failure Recovery and Logging of High-Performance Parallel File Systems.
ACM Trans. Storage
18 (2) (2022)
Chris Egersdoerfer
,
Di Zhang
,
Dong Dai
ClusterLog: Clustering Logs for Effeftxsctive Log-based Anomaly Detection.
FTXS@SC
(2022)
Abdullah Al Raqibul Islam
,
Dong Dai
,
Dazhao Cheng
VCSR: Mutable CSR Graph Format Using Vertex-Centric Packed Memory Array.
CCGRID
(2022)
Di Zhang
,
Dong Dai
,
Bing Xie
SchedInspector: A Batch Job Scheduling Inspector Using Reinforcement Learning.
HPDC
(2022)
Jiang Zhou
,
Yong Chen
,
Dong Dai
,
Yu Zhuang
,
Weiping Wang
I/O characteristic discovery for storage system optimizations.
J. Parallel Distributed Comput.
148 (2021)
Dong Dai
,
Yong Chen
,
Dries Kimpe
,
Robert B. Ross
Trigger-Based Incremental Data Processing with Unified Sync and Async Model.
IEEE Trans. Cloud Comput.
9 (1) (2021)
Di Zhang
,
Dong Dai
,
Runzhou Han
,
Mai Zheng
SentiLog: Anomaly Detecting on Parallel File Systems via Log-based Sentiment Analysis.
HotStorage
(2021)
Abdullah Al Raqibul Islam
,
Dong Dai
Understand the overheads of storage data structures on persistent memory.
PPoPP
(2020)
Jiang Zhou
,
Yong Chen
,
Wei Xie
,
Dong Dai
,
Shuibing He
,
Weiping Wang
PRS: A Pattern-Directed Replication Scheme for Heterogeneous Object-Based Storage.
IEEE Trans. Computers
69 (4) (2020)
Di Zhang
,
Dong Dai
,
Youbiao He
,
Forrest Sheng Bao
,
Bing Xie
RLScheduler: an automated HPC batch job scheduler using reinforcement learning.
SC
(2020)
Li Ruan
,
Xiangrong Xu
,
Limin Xiao
,
Feng Yuan
,
Yin Li
,
Dong Dai
A Comparative Study of Large-Scale Cluster Workload Traces via Multiview Analysis.
HPCC/SmartCity/DSS
(2019)
Neda Tavakoli
,
Dong Dai
,
Yong Chen
Client-side straggler-aware I/O scheduler for object-based parallel file systems.
Parallel Comput.
82 (2019)
Dong Dai
,
Yong Chen
,
Philip H. Carns
,
John Jenkins
,
Wei Zhang
,
Robert B. Ross
Managing Rich Metadata in High-Performance Computing Systems Using a Graph Model.
IEEE Trans. Parallel Distributed Syst.
30 (7) (2019)
Di Zhang
,
Dong Dai
,
Youbiao He
,
Forrest Sheng Bao
RLScheduler: Learn to Schedule HPC Batch Jobs Using Deep Reinforcement Learning.
CoRR
(2019)
Dong Dai
,
Forrest Sheng Bao
,
Jiang Zhou
,
Xuanhua Shi
,
Yong Chen
Vectorizing disks blocks for efficient storage system via deep learning.
Parallel Comput.
82 (2019)
Dong Dai
,
Om Rameshwar Gatla
,
Mai Zheng
A Performance Study of Lustre File System Checker: Bottlenecks and Potentials.
MSST
(2019)
Youbiao He
,
Dong Dai
,
Forrest Sheng Bao
Modeling HPC Storage Performance Using Long Short-Term Memory Networks.
HPCC/SmartCity/DSS
(2019)
Jiang Zhou
,
Dong Dai
,
Yu Mao
,
Xin Chen
,
Yu Zhuang
,
Yong Chen
I/O Characteristics Discovery in Cloud Storage Systems.
IEEE CLOUD
(2018)
Neda Tavakoli
,
Dong Dai
,
John Jenkins
,
Philip H. Carns
,
Robert B. Ross
,
Yong Chen
A Software-Defined Approach for QoS Control in High-Performance Computing Storage Systems.
CoRR
(2018)
Jinrui Cao
,
Om Rameshwar Gatla
,
Mai Zheng
,
Dong Dai
,
Vidya Eswarappa
,
Yan Mu
,
Yong Chen
PFault: A General Framework for Analyzing the Reliability of High-Performance Parallel File Systems.
ICS
(2018)
Dong Dai
,
Robert B. Ross
,
Dounia Khaldi
,
Yonghong Yan
,
Matthieu Dorier
,
Neda Tavakoli
,
Yong Chen
A Cross-Layer Solution in Scientific Workflow System for Tackling Data Movement Challenge.
CoRR
(2018)
Wei Zhang
,
Yong Chen
,
Dong Dai
AKIN: A Streaming Graph Partitioning Algorithm for Distributed Graph Storage Systems.
CCGrid
(2018)
Neda Tavakoli
,
Dong Dai
,
Yong Chen
Client-side Straggler-Aware I/O Scheduler for Object-based Parallel File Systems.
CoRR
(2018)
Wenke Li
,
Xuanhua Shi
,
Hong Huang
,
Peng Zhao
,
Hai Jin
,
Dong Dai
,
Yong Chen
GRAM: A GPU-Based Property Graph Traversal and Query for HPC Rich Metadata Management.
NPC
(2018)
Dong Dai
,
Wei Zhang
,
Yong Chen
POSTER: IOGP: An Incremental Online Graph Partitioning for Large-Scale Distributed Graph Databases.
PPOPP
(2017)
Alan Sill
,
Yong Chen
,
Jon R. Hass
,
Dong Dai
,
Jerry Perez
DAAC Workshop Chairs' Welcome.
UCC (Companion")
(2017)
Dong Dai
,
Wei Zhang
,
Yong Chen
IOGP: An Incremental Online Graph Partitioning Algorithm for Distributed Graph Databases.
HPDC
(2017)
Jiang Zhou
,
Wei Xie
,
Dong Dai
,
Yong Chen
Pattern-Directed Replication Scheme for Heterogeneous Object-based Storage.
CCGrid
(2017)
Dong Dai
,
Yong Chen
,
Philip H. Carns
,
John Jenkins
,
Robert B. Ross
Lightweight Provenance Service for High-Performance Computing.
PACT
(2017)
Chao Wang
,
Dong Dai
,
Xi Li
,
Aili Wang
,
Xuehai Zhou
SuperMIC: Analyzing Large Biological Datasets in Bioinformatics with Maximal Information Coefficient.
IEEE ACM Trans. Comput. Biol. Bioinform.
14 (4) (2017)
Dong Dai
,
Yong Chen
,
Philip H. Carns
,
John Jenkins
,
Wei Zhang
,
Robert B. Ross
GraphMeta: A Graph-Based Engine for Managing Large-Scale HPC Rich Metadata.
CLUSTER
(2016)
Gangyong Jia
,
Liang Shi
,
Xi Li
,
Dong Dai
PUMA: From Simultaneous to Parallel for Shared Memory System in Multi-core.
J. Signal Process. Syst.
84 (1) (2016)
Neda Tavakoli
,
Dong Dai
,
Yong Chen
Log-Assisted Straggler-Aware I/O Scheduler for High-End Computing.
ICPP Workshops
(2016)
Dong Dai
,
Forrest Sheng Bao
,
Jiang Zhou
,
Yong Chen
Block2Vec: A Deep Learning Strategy on Mining Block Correlations in Storage Systems.
ICPP Workshops
(2016)
Dong Dai
,
Philip H. Carns
,
Robert B. Ross
,
John Jenkins
,
Nicholas Muirhead
,
Yong Chen
An asynchronous traversal engine for graph-based rich metadata management.
Parallel Comput.
58 (2016)
Jinrui Cao
,
Simeng Wang
,
Dong Dai
,
Mai Zheng
,
Yong Chen
A Generic Framework for Testing Parallel File Systems.
PDSW-DISCS@SC
(2016)
Dong Dai
,
Philip H. Carns
,
Robert B. Ross
,
John Jenkins
,
Kyle Blauer
,
Yong Chen
GraphTrek: Asynchronous Graph Traversal for Property Graph-Based Metadata Management.
CLUSTER
(2015)
Dong Dai
,
Yong Chen
,
Dries Kimpe
,
Robert B. Ross
Provenance-based object storage prediction scheme for scientific big data applications.
IEEE BigData
(2014)
Gangyong Jia
,
Xi Li
,
Youwei Yuan
,
Jian Wan
,
Congfeng Jiang
,
Dong Dai
PseudoNUMA for reducing memory interference in multi-core systems.
SpringSim (HPS)
(2014)
Changlong Li
,
Xuehai Zhou
,
Mingming Sun
,
Kun Lu
,
Jinhong Zhou
,
Hang Zhuang
,
Dong Dai
DLBS: Decentralized load balancing scheme for event-driven cloud frameworks.
ICPADS
(2014)
Changlong Li
,
Hang Zhuang
,
Kun Lu
,
Mingming Sun
,
Jinhong Zhou
,
Dong Dai
,
Xuehai Zhou
An Adaptive Auto-configuration Tool for Hadoop.
ICECCS
(2014)
Gangyong Jia
,
Liang Shi
,
Jian Wan
,
Youwei Yuan
,
Xi Li
,
Dong Dai
PUMA: Pseudo unified memory architecture for single-ISA heterogeneous multi-core systems.
RTCSA
(2014)
Dong Dai
,
Yong Chen
,
Dries Kimpe
,
Robert B. Ross
,
Xuehai Zhou
Domino: an incremental computing framework in cloud with eventual synchronization.
HPDC
(2014)
Dong Dai
,
Yong Chen
,
Dries Kimpe
,
Robert B. Ross
Provenance-Based Prediction Scheme for Object Storage System in HPC.
CCGRID
(2014)
Gangyong Jia
,
Youwei Yuan
,
Jian Wan
,
Congfeng Jiang
,
Xi Li
,
Dong Dai
Temperature-Aware Scheduling Based on Dynamic Time-Slice Scaling.
ICA3PP (1)
(2014)
Kun Lu
,
Dong Dai
,
Xuehai Zhou
,
Mingming Sun
,
Changlong Li
,
Hang Zhuang
Unbinds data and tasks to improving the Hadoop performance.
SNPD
(2014)
Mingming Sun
,
Xuehai Zhou
,
Feng Yang
,
Kun Lu
,
Dong Dai
Bwasw-Cloud: Efficient sequence alignment algorithm for two big data with MapReduce.
ICADIWT
(2014)
Gangyong Jia
,
Guangjie Han
,
Liang Shi
,
Jian Wan
,
Dong Dai
Combine thread with memory scheduling for maximizing performance in multi-core systems.
ICPADS
(2014)
Dong Dai
,
Robert B. Ross
,
Philip H. Carns
,
Dries Kimpe
,
Yong Chen
Using property graphs for rich metadata management in HPC systems.
PDSW@SC
(2014)
Dong Dai
,
Yong Chen
,
Dries Kimpe
,
Robert B. Ross
Two-Choice Randomized Dynamic I/O Scheduler for Object Storage Systems.
SC
(2014)
Gangyong Jia
,
Xi Li
,
Jian Wan
,
Chao Wang
,
Dong Dai
,
Congfeng Jiang
Coordinate Task and Memory Management for Improving Power Efficiency.
ICA3PP (1)
(2013)
Dong Dai
,
Xi Li
,
Chao Wang
,
Junneng Zhang
,
Xuehai Zhou
Detecting Associations in Large Dataset on MapReduce.
TrustCom/ISPA/IUCC
(2013)
Gangyong Jia
,
Xi Li
,
Jian Wan
,
Chao Wang
,
Dong Dai
Group Scheduling for Improving Both CPU and Memory Power Efficiency Simultaneously.
HPCC/EUC
(2013)
Dong Dai
,
Xi Li
,
Chao Wang
,
Xuehai Zhou
Cloud Based Short Read Mapping Service.
CLUSTER
(2012)
Chao Wang
,
Xi Li
,
Dong Dai
,
Gangyong Jia
,
Xuehai Zhou
Phase Detection for Loop-Based Programs on Multicore Architectures.
CLUSTER
(2012)
Dong Dai
,
Xi Li
,
Chao Wang
,
Mingming Sun
,
Xuehai Zhou
Sedna: A Memory Based Key-Value Storage System for Realtime Processing in Cloud.
CLUSTER Workshops
(2012)