Flash-LLM: Enabling Low-Cost and Highly-Efficient Large Generative Model Inference With Unstructured Sparsity.
Haojun XiaZhen ZhengYuchao LiDonglin ZhuangZhongzhu ZhouXiafei QiuYong LiWei LinShuaiwen Leon SongPublished in: Proc. VLDB Endow. (2023)
Keyphrases
- highly efficient
- generative model
- low cost
- boltzmann machine
- probabilistic model
- markov chain monte carlo
- bayesian framework
- discriminative learning
- discriminative models
- conditional random fields
- dirichlet process mixture models
- bayesian inference
- high dimensional
- prior knowledge
- em algorithm
- topic models
- semi supervised
- low complexity
- latent dirichlet allocation
- pitman yor process
- expectation maximization
- real time
- bayesian networks
- sparse representation
- bayesian model
- generative process
- active learning
- search algorithm
- image processing