Protein language models trained on multiple sequence alignments learn phylogenetic relationships.
Umberto LupoDamiano SgarbossaAnne-Florence BitbolPublished in: CoRR (2022)
Keyphrases
- language model
- multiple sequence alignments
- phylogenetic trees
- protein sequences
- language modeling
- multiple sequence alignment
- document retrieval
- n gram
- probabilistic model
- sequence alignment
- protein protein interactions
- pairwise
- query expansion
- retrieval model
- statistical language models
- information retrieval
- smoothing methods
- test collection
- computational biology
- protein structure
- multiple alignment
- relevance model
- machine learning
- protein structure prediction
- memory efficient
- language models for information retrieval