GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics.
Maxim ZvyaginAlexander BraceKyle HippeYuntian DengBin ZhangCindy Orozco BohorquezAustin ClydeBharat KaleDanilo Perez-RiveraHeng MaCarla M. MannMichael W. IrvinDefne G. OzgulbasNatalia VassilievaJ. Gregory PauloskiLogan WardValerie Hayot-SassonMurali EmaniSam ForemanZhen XieDiangen LinMaulik ShuklaWeili NieJosh RomeroChristian DallagoArash VahdatChaowei XiaoThomas GibbsIan T. FosterJames J. DavisMichael E. PapkaThomas S. BrettinRick StevensAnima AnandkumarVenkatram VishwanathArvind RamanathanPublished in: Int. J. High Perform. Comput. Appl. (2023)
Keyphrases
- language model
- genome scale
- language modeling
- sequence similarity
- n gram
- metabolic pathways
- information retrieval
- speech recognition
- probabilistic model
- retrieval model
- systems biology
- language modelling
- test collection
- query expansion
- statistical language models
- high throughput
- vector space model
- relevance model
- protein protein interactions
- protein function
- smoothing methods
- dna binding
- biological data
- language models for information retrieval
- saccharomyces cerevisiae
- genomic data
- computational methods
- knowledge discovery
- similarity measure