Experiences Detecting Defective Hardware in Exascale Supercomputers.
Nick HagertyJordan WebbVerónica G. Melesse VergaraMatthew EzellPublished in: SC Workshops (2023)
Keyphrases
- high performance computing
- massively parallel
- scientific computing
- computing systems
- blue gene
- parallel computing
- real time
- hardware and software
- case study
- low cost
- computer systems
- field programmable gate array
- processing units
- energy efficiency
- fine grained
- fault tolerance
- personal computer
- grid computing
- hardware implementation
- graphics processing units
- computing environments
- distributed memory
- general purpose
- energy consumption
- hardware architecture
- response time