Login / Signup

FlattenQuant: Breaking Through the Inference Compute-bound for Large Language Models with Per-tensor Quantization.

Yi ZhangFei YangShuang PengFangyu WangAimin Pan
Published in: CoRR (2024)
Keyphrases