Login / Signup
FTXS@SC
2018
2022
2018
2022
Keyphrases
Publications
2022
Bo Fang
,
Siva Kumar Sastry Hari
,
Timothy Tsai
,
Xinyi Li
,
Ganesh Gopalakrishnan
,
Ignacio Laguna
,
Kevin J. Barker
,
Ang Li
Towards Precision-Aware Fault Tolerance Approaches for Mixed-Precision Applications.
FTXS@SC
(2022)
Aurelien Bouteiller
,
George Bosilca
Implicit Actions and Non-blocking Failure Recovery with MPI.
FTXS@SC
(2022)
Chris Egersdoerfer
,
Di Zhang
,
Dong Dai
ClusterLog: Clustering Logs for Effeftxsctive Log-based Anomaly Detection.
FTXS@SC
(2022)
Yehonatan Fridman
,
Yaniv Snir
,
Harel Levin
,
Danny Hendler
,
Hagit Attiya
,
Gal Oren
Recovery of Distributed Iterative Solvers for Linear Systems Using Non-Volatile RAM.
FTXS@SC
(2022)
Lukas Hübner
,
Demian Hespe
,
Peter Sanders
,
Alexandros Stamatakis
ReStore: In-Memory REplicated STORagE for Rapid Recovery in Fault-Tolerant Algorithms.
FTXS@SC
(2022)
12th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, FTXS@SC 2022, Dallas, TX, USA, November 13-18, 2022
FTXS@SC
(2022)
2021
11th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, FTXS@SC 2021, St. Louis, MO, USA, November 14, 2021
FTXS@SC
(2021)
Nathan DeBardeleben
,
Tom Burr
,
Stephen Penton
,
Craig Walker
,
Josip Loncaric
,
William M. Jones
Statistical Framework for Two-Party Acceptance Testing of HPC Systems for Reliability.
FTXS@SC
(2021)
Zheng Miao
,
Jon C. Calhoun
,
Rong Ge
Relaxed Replication for Energy Efficient and Resilient GPU Computing.
FTXS@SC
(2021)
Trokon Johnson
,
Herman Lam
Incorporating Fault-Tolerance Awareness into System-Level Modeling and Simulation.
FTXS@SC
(2021)
Yehonatan Fridman
,
Yaniv Snir
,
Matan Rusanovsky
,
Kfir Zvi
,
Harel Levin
,
Danny Hendler
,
Hagit Attiya
,
Gal Oren
Assessing the Use Cases of Persistent Memory in High-Performance Scientific Computing.
FTXS@SC
(2021)
Philipp Samfass
,
Tobias Weinzierl
,
Anne Reinarz
,
Michael Bader
Doubt and Redundancy Kill Soft Errors - Towards Detection and Correction of Silent Data Corruption in Task-based Numerical Software.
FTXS@SC
(2021)
2020
10th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, FTXS@SC 2020, Atlanta, GA, USA, November 11, 2020
FTXS@SC
(2020)
Romain Lion
,
Samuel Thibault
From tasks graphs to asynchronous distributed checkpointing with local restart.
FTXS@SC
(2020)
Hemanth Kolla
,
Jackson R. Mayo
,
Keita Teranishi
,
Robert C. Armstrong
Improving Scalability of Silent-Error Resilience for Message-Passing Solvers via Local Recovery and Asynchrony.
FTXS@SC
(2020)
Mohit Kumar
,
Christian Engelmann
Models for Resilience Design Patterns.
FTXS@SC
(2020)
Md Abdullah Shahneous Bari
,
Debasmita Basu
,
Wenbin Lu
,
Tony Curtis
,
Barbara M. Chapman
Checkpointing OpenSHMEM Programs Using Compiler Analysis.
FTXS@SC
(2020)
Scott Levy
Message from the Workshop Chair.
FTXS@SC
(2020)
Carlos Pachajoa
,
Robert Ernstbrunner
,
Wilfried N. Gansterer
A Generic Strategy for Node-Failure Resilience for Certain Iterative Linear Algebra Methods.
FTXS@SC
(2020)
Nikunj Gupta
,
Jackson R. Mayo
,
Adrian S. Lemoine
,
Hartmut Kaiser
Towards Distributed Software Resilience in Asynchronous Many- Task Programming Models.
FTXS@SC
(2020)
2019
Piyush Sao
,
Christian Engelmann
,
Srinivas Eswar
,
Oded Green
,
Richard W. Vuduc
Self-stabilizing Connected Components.
FTXS@SC
(2019)
Carlos Pachajoa
,
Christina Pacher
,
Wilfried N. Gansterer
Node-Failure-Resistant Preconditioned Conjugate Gradient Method without Replacement Nodes.
FTXS@SC
(2019)
Chun-Kai Chang
,
Guanpeng Li
,
Mattan Erez
Evaluating Compiler IR-Level Selective Instruction Duplication with Realistic Hardware Errors.
FTXS@SC
(2019)
9th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, FTXS@SC 2019, Denver, CO, USA, November 22, 2019
FTXS@SC
(2019)
Einar Horn
,
Dakota Fulp
,
Jon Calhoun
,
Luke Olson
FaultSight: A Fault Analysis Tool for HPC Researchers.
FTXS@SC
(2019)
Nuria Losada
,
Aurélien Bouteiller
,
George Bosilca
Asynchronous Receiver-Driven Replay for Local Rollback of MPI Applications.
FTXS@SC
(2019)
Tyler Coy
,
Xuechen Zhang
Enforcing Crash Consistency of Scientific Applications in Non-Volatile Main Memory Systems.
FTXS@SC
(2019)
2018
Nuria Losada
,
Leonardo Bautista-Gomez
,
Kai Keller
,
Osman S. Unsal
Towards Ad Hoc Recovery for Soft Errors.
FTXS@SC
(2018)
IEEE/ACM 8th Workshop on Fault Tolerance for HPC at eXtreme Scale, FTXS@SC 2018, Dallas, TX, USA, November 16, 2018
FTXS@SC
(2018)
Yawei Hui
,
Byung-Hoon Park
,
Christian Engelmann
A Comprehensive Informative Metric for Analyzing HPC System Status Using the LogSCAN Platform.
FTXS@SC
(2018)
Marc Platini
,
Thomas Ropars
,
Benoit Pelletier
,
Noel De Palma
CPU Overheating Characterization in HPC Systems: A Case Study.
FTXS@SC
(2018)
Anne Reinarz
,
Jean-Mathieu Gallard
,
Michael Bader
Influence of A-Posteriori Subcell Limiting on Fault Frequency in Higher-Order DG Schemes.
FTXS@SC
(2018)
Carlos Pachajoa
,
Markus Levonyak
,
Wilfried N. Gansterer
Extending and Evaluating Fault-Tolerant Preconditioned Conjugate Gradient Methods.
FTXS@SC
(2018)
Rizwan A. Ashraf
,
Christian Engelmann
Analyzing the Impact of System Reliability Events on Applications in the Titan Supercomputer.
FTXS@SC
(2018)
Neil Agarwal
,
Hugh Greenberg
,
Sean Blanchard
,
Nathan DeBardeleben
SaNSA - The Supercomputer and Node State Architecture.
FTXS@SC
(2018)
Alexandra Poulos
,
Dylan Wallace
,
Robert Robey
,
Laura Monroe
,
Vanessa Job
,
Sean Blanchard
,
William M. Jones
,
Nathan DeBardeleben
Improving Application Resilience by Extending Error Correction with Contextual Information.
FTXS@SC
(2018)
Felix Loh
,
Kewal K. Saluja
,
Parameswaran Ramanathan
Fault Tolerant Cholesky Factorization on GPUs.
FTXS@SC
(2018)