Accelerating a Triton Fused Kernel for W4A16 Quantized Inference with SplitK work decomposition.

Published in: CoRR (2024)

Keyphrases