Coping with silent and fail-stop errors at scale by combining replication and checkpointing.
Anne BenoitAurélien CavelanFranck CappelloPadma RaghavanYves RobertHongyang SunPublished in: J. Parallel Distributed Comput. (2018)
Keyphrases
- distributed databases
- fault tolerance
- fault tolerant
- distributed database systems
- distributed systems
- error detection
- neural network
- low overhead
- data replication
- main memory databases
- data sets
- error analysis
- estimation error
- distributed environment
- data warehousing
- least squares
- multiscale
- image processing
- search engine
- artificial intelligence
- machine learning
- data mining