Login / Signup

POSTER: LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization.

Juntao ZhaoBorui WanChuan WuYanghua PengHaibin Lin
Published in: PPoPP (2024)
Keyphrases
  • adaptive quantization
  • shape coding
  • rate distortion
  • disjoint clusters
  • low bit rate
  • clustering algorithm
  • subband coding
  • image compression
  • subband
  • high order
  • compression algorithm