Understanding GPU errors on large-scale HPC systems and the implications for system design and operation.
Devesh TiwariSaurabh GuptaJames H. RogersDon MaxwellPaolo RechSudharshan S. VazhkudaiDaniel OliveiraDave LondoNathan DeBardelebenPhilippe Olivier Alexandre NavauxLuigi CarroArthur S. BlandPublished in: HPCA (2015)
Keyphrases
- technical systems
- case study
- design criteria
- interactive systems
- neural network
- complex systems
- design tools
- design issues
- design process
- knowledge based systems
- retrieval systems
- management system
- user interface
- expert systems
- data intensive
- parallel computing
- computing environments
- parallel implementation
- engineering design
- embedded systems
- intelligent systems
- real world