Sifting through genomes with iterative-sequence clustering produces a large, phylogenetically diverse protein-family resource.
Thomas J. SharptonGuillaume JospinDongying WuMorgan G. I. LangilleKatherine S. PollardJonathan A. EisenPublished in: BMC Bioinform. (2012)
Keyphrases
- protein families
- protein sequences
- dna sequences
- amino acids
- clustering algorithm
- sequence analysis
- genomic sequences
- protein structure
- k means
- clustering method
- binding sites
- structural motifs
- information theoretic
- sequence alignment
- amino acid sequences
- computational biology
- structural properties
- document clustering
- genome sequences
- similarity measure
- machine learning