Subfamily specific conservation profiles for proteins based on n-gram patterns.
John K. VriesXiong LiuPublished in: BMC Bioinform. (2008)
Keyphrases
- n gram
- language model
- language independent
- language modeling
- text classification
- bag of words
- variable length
- viterbi algorithm
- word segmentation
- part of speech
- language modelling
- specific features
- protein sequences
- user profiles
- web documents
- classification accuracy
- bayesian networks
- information retrieval
- data mining
- language specific
- real world