FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU.
Ying ShengLianmin ZhengBinhang YuanZhuohan LiMax RyabininBeidi ChenPercy LiangChristopher RéIon StoicaCe ZhangPublished in: ICML (2023)
Keyphrases
- high throughput
- language model
- language modeling
- microarray
- genome wide
- probabilistic model
- n gram
- language modelling
- biological data
- systems biology
- information retrieval
- statistical language models
- document retrieval
- retrieval model
- test collection
- protein protein interactions
- data acquisition
- speech recognition
- genomic data
- mass spectrometry
- vector space model
- generative model
- language models for information retrieval
- query expansion
- image analysis
- document ranking
- smoothing methods
- clustering algorithm
- real time
- proteomic data
- bayesian networks
- language modeling framework