Hadoop and PySpark for Reproducibility and Scalability of Genomic Sequencing Studies.
Nicholas R. WheelerPenelope BenchekBrian W. KunkleKara L. Hamilton-Nelson MikeWarfeJeremy R. FondranJonathan L. HainesWilliam S. BushPublished in: PSB (2020)
Keyphrases
- map reduce
- open source
- high throughput
- empirical studies
- high throughput sequencing
- neural network
- mapreduce framework
- database systems
- distributed systems
- cloud computing
- parallel algorithm
- knowledge discovery
- fault tolerant
- big data
- fault tolerance
- distributed computing
- social media
- computational approaches
- data sets