nuQmm: Quantized MatMul for Efficient Inference of Large-Scale Generative Language Models.
Gunho ParkBaeseong ParkSe Jung KwonByeongwook KimYoungjoo LeeDongsoo LeePublished in: CoRR (2022)
Keyphrases
- language model
- efficient inference
- language modeling
- probabilistic model
- probabilistic inference
- conditional random fields
- n gram
- generative model
- retrieval model
- information retrieval
- hidden variables
- markov random field
- fully connected
- exact inference
- approximate inference
- human pose estimation
- structured prediction
- language models for information retrieval
- language modeling framework
- markov networks
- graphical models
- query expansion
- belief propagation
- factor graphs
- smoothing methods
- relevance model
- machine learning
- message passing