Fault-tolerant Routing for Multiple Permanent and Non-permanent Faults in HPC Systems.
Gonzalo ZarzaDiego LugonesDaniel FrancoEmilio LuquePublished in: PDPTA (2010)
Keyphrases
- fault tolerant
- fault tolerance
- distributed systems
- safety critical
- interconnection networks
- error detection
- high availability
- load balancing
- fault isolation
- high assurance
- state machine
- scientific computing
- distributed computing
- expert systems
- physical systems
- computer systems
- management system
- parallel algorithm
- routing protocol
- error correction
- sensor data
- fault diagnosis
- shortest path
- intelligent systems
- wireless sensor networks
- database systems