Monitoring and Predicting Hardware Failures in HPC Clusters with FTB-IPMI.
Raghunath RajachandrasekarXavier BesseronDhabaleswar K. PandaPublished in: IPDPS Workshops (2012)
Keyphrases
- real time
- failure detection
- clustering algorithm
- high performance computing
- low cost
- scientific computing
- data acquisition
- massively parallel
- hardware and software
- data points
- computing systems
- hierarchical clustering
- hardware implementation
- overlapping clusters
- cluster analysis
- fault tolerance
- monitoring system
- data clustering
- parallel hardware
- hardware software
- early warning
- decision support
- fuzzy c means
- hardware design
- hardware architecture
- data management
- microarray
- subspace clustering