Alignment-free clustering of large data sets of unannotated protein conserved regions using minhashing.
Armen AbnousiShira L. BroschatAnanth KalyanaramanPublished in: BMC Bioinform. (2018)
Keyphrases
- sequence alignment
- transcription factor binding sites
- rna sequences
- multiple sequence alignment
- protein sequences
- clustering algorithm
- data sets
- k means
- protein coding regions
- clustering method
- pairwise
- dna sequences
- hierarchical clustering
- nucleotide sequences
- data reduction
- protein structure alignment
- data analysis
- phylogenetic analysis
- multiple sequence alignments
- multiple alignment
- arbitrary shape
- amino acids
- rna secondary structures
- input image
- high dimensional data
- binding sites
- image regions