• search
    search
  • reviewers
    reviewers
  • feeds
    feeds
  • assignments
    assignments
  • settings
  • logout

POSTER: LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization.

Juntao ZhaoBorui WanChuan WuYanghua PengHaibin Lin
Published in: PPoPP (2024)
Keyphrases
  • adaptive quantization
  • shape coding
  • rate distortion
  • disjoint clusters
  • low bit rate
  • clustering algorithm
  • subband coding
  • image compression
  • subband
  • high order
  • compression algorithm