DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving.
Yinmin ZhongShengyu LiuJunda ChenJianbo HuYibo ZhuXuanzhe LiuXin JinHao ZhangPublished in: OSDI (2024)
Keyphrases
- language model
- language modeling
- document retrieval
- n gram
- query expansion
- probabilistic model
- information retrieval
- speech recognition
- test collection
- retrieval model
- language modelling
- ad hoc information retrieval
- mixture model
- query terms
- statistical language models
- translation model
- smoothing methods
- language models for information retrieval
- context sensitive
- word error rate
- relevance model
- document collections
- query specific
- statistical machine translation
- cross lingual
- retrieval effectiveness
- error rate
- language model for information retrieval