Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Production Systems.
Haryadi S. GunawiRiza O. SumintoRussell SearsCasey GolliherSwaminathan SundararamanXing LinTim EmamiWeiguang ShengNematollah BidokhtiCaitie McCaffreyGary GriderParks M. FieldsKevin HarmsRobert B. RossAndree JacobsonRobert RicciKirk WebbPeter AlvaroH. Birali RuneshaMingzhe HaoHuaicheng LiPublished in: FAST (2018)
Keyphrases
- production system
- multistage
- low cost
- production process
- control structure
- production rules
- markov decision
- hardware and software
- fault diagnosis
- empirical evidence
- expert systems
- multiprocessor architecture
- real time
- certainty factor
- error detection
- computing systems
- production line
- scheduling jobs
- hardware implementation
- embedded systems
- computer systems
- neural network
- fault detection
- massively parallel
- model based diagnosis
- parallel algorithm
- test cases
- hardware architecture
- multiple faults
- search algorithm
- image processing