A low-level software-based fault tolerance approach to detect SEUs in GPUs' register files.
Marcio GonçalvesMateus SaquettiFernanda Lima KastensmidtJosé Rodrigo AzambujaPublished in: Microelectron. Reliab. (2017)
Keyphrases
- fault tolerance
- fault tolerant
- low level
- load balancing
- distributed computing
- high availability
- high level
- distributed systems
- replicated databases
- response time
- peer to peer
- mobile agents
- database replication
- group communication
- fault management
- failure recovery
- computer systems
- high performance computing
- single point of failure
- error detection
- end to end
- software systems
- data replication
- software components
- data streams
- e learning
- databases