Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving.

Yinwei Dai Rui Pan Anand P. Iyer Kai Li Ravi Netravali

Published in: CoRR (2023)

Keyphrases

low latency
response time
resource utilization
maximum likelihood
high speed
high throughput
prefetching
real time
virtual machine
learning processes