LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models.
Gunho ParkBaeseong ParkMinsub KimSungjae LeeJeonghoon KimBeomseok KwonSe Jung KwonByeongwook KimYoungjoo LeeDongsoo LeePublished in: ICLR (2024)
Keyphrases
- language model
- efficient inference
- matrix multiplication
- language modeling
- probabilistic inference
- message passing
- probabilistic model
- approximate inference
- generative model
- conditional random fields
- hidden variables
- n gram
- human pose estimation
- information retrieval
- markov random field
- exact inference
- graphical models
- fully connected
- language modeling framework
- graph structure
- junction tree
- smoothing methods
- markov networks
- structured prediction
- belief propagation
- bayesian networks
- machine learning
- text categorization
- shared memory
- distributed memory
- image sequences
- semi supervised
- three dimensional
- model selection