• search
    search
  • reviewers
    reviewers
  • feeds
    feeds
  • assignments
    assignments
  • settings
  • logout

Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving.

Yinwei DaiRui PanAnand P. IyerKai LiRavi Netravali
Published in: CoRR (2023)
Keyphrases
  • low latency
  • response time
  • resource utilization
  • maximum likelihood
  • high speed
  • high throughput
  • prefetching
  • real time
  • virtual machine
  • learning processes