Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity.
Haojun XiaZhen ZhengYuchao LiDonglin ZhuangZhongzhu ZhouXiafei QiuYong LiWei LinShuaiwen Leon SongPublished in: CoRR (2023)
Keyphrases
- cost effective
- generative model
- highly efficient
- low cost
- boltzmann machine
- markov chain monte carlo
- probabilistic model
- discriminative learning
- cost effectiveness
- dirichlet process mixture models
- prior knowledge
- em algorithm
- conditional random fields
- discriminative models
- bayesian framework
- latent dirichlet allocation
- pitman yor process
- topic models
- sparse representation
- low complexity
- generative process
- bayesian inference
- structured prediction
- gibbs sampling
- semi supervised
- bayesian networks
- machine learning
- bayesian model
- real time
- expectation maximization
- image sequences
- computer vision