DeepLat: Achieving Minimum Worst Case Latency for DNN Inference with Batch-Aware Dispatching.
Jiaheng GaoYitao HuPublished in: ICA3PP (1) (2023)
Keyphrases
- worst case
- online algorithms
- lower bound
- average case
- constant factor
- upper bound
- probabilistic inference
- error bounds
- worst case analysis
- inference process
- minimum cost
- bayesian inference
- greedy algorithm
- response time
- np hard
- bayesian networks
- prefetching
- training process
- bayesian model
- batch processing
- manufacturing systems
- graphical models
- inference engine
- scheduling problem
- active learning
- production scheduling
- neural network