A highly parallel next-generation DNA sequencing data analysis pipeline in Hadoop.
Kareem S. AggourVijay S. KumarDipen P. SangurdekarLee A. NewbergChinnappa D. KodiraPublished in: BIBM (2015)
Keyphrases
- highly parallel
- dna sequencing
- data analysis
- life sciences
- big data
- dna sequences
- high throughput
- efficient implementation
- open source
- parallel architectures
- biological data
- data processing
- data collection
- cloud computing
- distributed systems
- data mining
- computing systems
- single chip
- general purpose
- parallel programming
- data integration
- single pass
- data management
- stream processing
- machine learning
- scientific data
- data acquisition
- graphics processing units
- data intensive
- data warehouse
- distributed computing
- data access
- knowledge discovery