Machine Learning Models for GPU Error Prediction in a Large Scale HPC System.
Bin NieJi XueSaurabh GuptaTirthak PatelChristian EngelmannEvgenia SmirniDevesh TiwariPublished in: DSN (2018)
Keyphrases
- machine learning models
- predictive model
- prediction error
- machine learning algorithms
- machine learning approaches
- spam filtering
- prediction model
- high performance computing
- prediction accuracy
- machine learning
- fault tolerance
- graphics processing units
- multi class
- parallel computing
- graphical models
- real world
- hidden markov models
- training data
- decision trees
- learning algorithm