Login / Signup
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference.
Ali Hadi Zadeh
Isak Edo
Omar Mohamed Awad
Andreas Moshovos
Published in:
MICRO (2020)
Keyphrases
</>
efficient inference
low latency
real time
probabilistic model
human pose estimation
hidden variables
high speed
structured prediction
machine learning
parameter estimation
conditional random fields
highly efficient
gaussian processes
fully connected