DeepLat: Achieving Minimum Worst Case Latency for DNN Inference with Batch-Aware Dispatching.

Jiaheng Gao Yitao Hu

Published in: ICA3PP (1) (2023)

Keyphrases

worst case
online algorithms
lower bound
average case
constant factor
upper bound
probabilistic inference
error bounds
worst case analysis
inference process
minimum cost
bayesian inference
greedy algorithm
response time
np hard
bayesian networks
prefetching
training process
bayesian model
batch processing
manufacturing systems
graphical models
inference engine
scheduling problem
active learning
production scheduling
neural network