Leveraging protein language model embeddings and logistic regression for efficient and accurate in-silico acidophilic proteins classification.
Meredita SusantyMuhammad K. N. MursalimRukman HertadiAyu PurwariantiTati L. R. MengkoPublished in: Comput. Biol. Chem. (2024)
Keyphrases
- logistic regression
- language model
- decision trees
- support vector
- logistic regression models
- language modeling
- fold cross validation
- linear support vector machines
- amino acids
- protein sequences
- naive bayes
- n gram
- linear svm
- classification accuracy
- retrieval model
- document retrieval
- protein protein interactions
- query expansion
- information retrieval
- probabilistic model
- training set
- machine learning
- support vector machine
- loss function
- mixture model
- feature selection
- query terms
- bayesian classifiers
- translation model
- training data
- vector space model
- text classification
- feature ranking
- data analysis
- similarity measure
- feature space
- class labels