A fast hierarchical clustering algorithm for large-scale protein sequence data sets.
Sándor M. SzilágyiLászló SzilágyiPublished in: Comput. Biol. Medicine (2014)
Keyphrases
- protein sequences
- hierarchical clustering algorithm
- data sets
- hierarchical clustering
- parameter free
- amino acids
- protein structure
- clustering algorithm
- secondary structure
- clustering method
- protein secondary structure
- structural motifs
- protein structure prediction
- categorical data
- protein structural
- gene expression data
- data streams
- similarity measure
- protein folding
- training set
- protein secondary structure prediction
- genome sequences
- data records
- feature selection
- unsupervised learning
- test data
- arbitrary shape
- high dimensional data
- database
- cluster analysis