FlattenQuant: Breaking through the Inference Compute-bound for Large Language Models with Per-tensor Quantization.

Published in: LREC/COLING (2024)

Keyphrases