Login / Signup

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving.

Yujun LinHaotian TangShang YangZhekai ZhangGuangxuan XiaoChuang GanSong Han
Published in: CoRR (2024)
Keyphrases
  • cost effective
  • computer vision
  • neural network
  • social networks
  • image segmentation
  • three dimensional
  • multiscale
  • motion estimation
  • computationally expensive