CRISPR sequences are sometimes erroneously translated and can contaminate public databases with spurious proteins containing spaced repeats.
Alejandro RubioPablo MierMiguel A. Andrade-NavarroAndrés GarzónJuan JiménezAntonio J. Pérez-PulidoPublished in: Database J. Biol. Databases Curation (2020)
Keyphrases
- public databases
- sequence analysis
- protein sequences
- sequence data
- homo sapiens
- computational biology
- sequence similarity
- databases
- biological information
- amino acid sequences
- hidden markov models
- physicochemical properties
- phylogenetic trees
- sequential patterns
- amino acids
- protein structure
- protein function
- sequence databases
- protein protein interactions
- living cells
- text mining