FlattenQuant: Breaking Through the Inference Compute-bound for Large Language Models with Per-tensor Quantization.

Published in: CoRR (2024)

Keyphrases