Login / Signup
Don Maxwell
ORCID
Publication Activity (10 Years)
Years Active: 2015-2021
Publications (10 Years): 7
Top Topics
Survival Analysis
Weighted Tardiness
Wafer Fabrication
Lessons Learned
Top Venues
SC
Euro-Par
SBAC-PAD
ISC Workshops
</>
Publications
</>
Elvis Rojas
,
Esteban Meneses
,
Terry Jones
,
Don Maxwell
Understanding failures through the lifetime of a top-level supercomputer.
J. Parallel Distributed Comput.
154 (2021)
George Ostrouchov
,
Don Maxwell
,
Rizwan A. Ashraf
,
Christian Engelmann
,
Mallikarjun Shankar
,
James H. Rogers
GPU lifetimes on titan supercomputer: survival analysis and reliability.
SC
(2020)
Elvis Rojas
,
Esteban Meneses
,
Terry Jones
,
Don Maxwell
Towards a Model to Estimate the Reliability of Large-Scale Hybrid Supercomputers.
Euro-Par
(2020)
Verónica G. Vergara Larrea
,
Wayne Joubert
,
Michael J. Brim
,
Reuben D. Budiardja
,
Don Maxwell
,
Matthew Ezell
,
Christopher Zimmer
,
Swen Boehm
,
Wael R. Elwasif
,
Sarp Oral
,
Chris Fuson
,
Daniel Pelfrey
,
Oscar R. Hernandez
,
Dustin Leverman
,
Jesse Hanley
,
Mark A. Berrill
,
Arnold N. Tharrington
Scaling the Summit: Deploying the World's Fastest Supercomputer.
ISC Workshops
(2019)
Verónica G. Vergara Larrea
,
Michael J. Brim
,
Wayne Joubert
,
Swen Boehm
,
Matthew B. Baker
,
Oscar R. Hernandez
,
Sarp Oral
,
James Simmons
,
Don Maxwell
Are we witnessing the spectre of an HPC meltdown?
Concurr. Comput. Pract. Exp.
31 (16) (2019)
Elvis Rojas
,
Esteban Meneses
,
Terry Jones
,
Don Maxwell
Analyzing a Five-Year Failure Record of a Leadership-Class Supercomputer.
SBAC-PAD
(2019)
Christopher Zimmer
,
Don Maxwell
,
Stephen McNally
,
Scott Atchley
,
Sudharshan S. Vazhkudai
GPU age-aware scheduling to improve the reliability of leadership jobs on Titan.
SC
(2018)
Devesh Tiwari
,
Saurabh Gupta
,
James H. Rogers
,
Don Maxwell
,
Paolo Rech
,
Sudharshan S. Vazhkudai
,
Daniel Oliveira
,
Dave Londo
,
Nathan DeBardeleben
,
Philippe Olivier Alexandre Navaux
,
Luigi Carro
,
Arthur S. Bland
Understanding GPU errors on large-scale HPC systems and the implications for system design and operation.
HPCA
(2015)
Saurabh Gupta
,
Devesh Tiwari
,
Christopher Jantzi
,
James H. Rogers
,
Don Maxwell
Understanding and Exploiting Spatial Properties of System Failures on Extreme-Scale HPC Systems.
DSN
(2015)
Devesh Tiwari
,
Saurabh Gupta
,
George Gallarno
,
Jim Rogers
,
Don Maxwell
Reliability lessons learned from GPU experience with the Titan supercomputer at Oak Ridge leadership computing facility.
SC
(2015)