Tree visualizations of protein sequence embedding space enable improved functional clustering of diverse protein superfamilies.
Wayland YeungZhongliang ZhouLiju MathewNathan GravelRahil TaujaleBrady O'boyleMariah SalcedoAarya VenkatWilliam LanzilottaSheng LiNatarajan KannanPublished in: Briefings Bioinform. (2023)
Keyphrases
- protein sequences
- amino acids
- computational biology
- protein structure
- protein function
- embedding space
- amino acid sequences
- secondary structure
- protein classification
- protein structure prediction
- data points
- sequence analysis
- k means
- predicting protein
- protein families
- amino acid composition
- protein secondary structure
- euclidean space
- graph embedding
- structural motifs
- input space
- manifold learning
- multiple sequence alignments
- protein structural
- cluster analysis
- low dimensional
- protein protein
- dimensionality reduction
- sequence alignment
- geometric structure
- high dimensional