Login / Signup
Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases.
Xiaoxia Wu
Cheng Li
Reza Yazdani Aminabadi
Zhewei Yao
Yuxiong He
Published in:
CoRR (2023)
Keyphrases
</>
computational modeling
parametric models
multiscale
statistical models
probabilistic model
response time
mathematical models
computational models
experimental data
process model
complex systems
data mining
computational complexity
image sequences
information systems
search engine
machine learning