Choosing Non-redundant Representative Subsets Of Protein Sequence Data Sets Using Submodular Optimization.
Maxwell W. LibbrechtJeffrey A. BilmesWilliam Stafford NoblePublished in: BCB (2018)
Keyphrases
- protein sequences
- data sets
- protein structure
- computational biology
- amino acids
- secondary structure
- protein secondary structure
- greedy algorithm
- computational methods
- protein folding
- genome sequences
- protein protein
- min sum
- protein classification
- coarse grained
- amino acid composition
- protein structural
- protein structure and function
- structural motifs
- gene expression data
- data analysis