Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Production Systems.
Haryadi S. GunawiRiza O. SumintoRussell SearsCasey GolliherSwaminathan SundararamanXing LinTim EmamiWeiguang ShengNematollah BidokhtiCaitie McCaffreyDeepthi SrinivasanBiswaranjan PandaAndrew BaptistGary GriderParks M. FieldsKevin HarmsRobert B. RossAndree JacobsonRobert RicciKirk WebbPeter AlvaroH. Birali RuneshaMingzhe HaoHuaicheng LiPublished in: ACM Trans. Storage (2018)
Keyphrases
- production system
- production process
- multistage
- multiprocessor architecture
- error detection
- hardware and software
- low cost
- fault detection
- certainty factor
- expert systems
- control structure
- fault diagnosis
- hardware implementation
- empirical evidence
- markov decision
- real time
- vlsi implementation
- scale space
- production rules
- massively parallel
- scheduling problem
- production line
- hardware architecture
- lot streaming
- engineering design
- error correction
- data acquisition
- test cases
- np hard
- search algorithm
- image processing
- neural network