Login / Signup
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference.
Ali Hadi Zadeh
Andreas Moshovos
Published in:
CoRR (2020)
Keyphrases
</>
efficient inference
low latency
probabilistic inference
structured prediction
conditional random fields
hidden variables
probabilistic model
information extraction
high speed
energy consumption
message passing
approximate inference
linear models
fully connected