Language models can generate molecules, materials, and protein binding sites directly in three dimensions as XYZ, CIF, and PDB files.
Daniel Flam-ShepherdAlán Aspuru-GuzikPublished in: CoRR (2023)
Keyphrases
- language model
- binding sites
- virtual screening
- dna binding
- protein protein
- protein structure
- language modeling
- dna sequences
- sequence alignment
- sequence data
- gene expression
- regulatory elements
- transcription factor binding sites
- drosophila melanogaster
- n gram
- document retrieval
- transcription factors
- statistical significance
- probabilistic model
- speech recognition
- protein interaction
- information retrieval
- retrieval model
- query expansion
- protein structure prediction
- protein sequences
- motif discovery
- vector space model
- query terms
- protein families
- test collection
- smoothing methods
- influenza virus
- computational methods
- microarray
- human genome
- genome wide
- molecular biology