Adaptive Fault Management of Parallel Applications for High-Performance Computing.
Zhiling LanYawei LiPublished in: IEEE Trans. Computers (2008)
Keyphrases
- high performance computing
- fault management
- fault tolerance
- massively parallel
- parallel computing
- message passing interface
- scientific computing
- fault tolerant
- computational science
- computing systems
- network management
- distributed systems
- knowledge based systems
- load balancing
- distributed computing
- grid computing
- response time
- energy efficiency
- parallel implementation
- mobile agents
- fine grained
- computing resources
- parallel machines
- computer systems
- peer to peer
- databases
- shared memory
- distributed memory
- software engineering
- multi agent systems
- multi agent