SpecInfer: Accelerating Large Language Model Serving with Tree-based Speculative Inference and Verification.
Xupeng MiaoGabriele OliaroZhihao ZhangXinhao ChengZeyu WangZhengxin ZhangRae Ying Yee WongAlan ZhuLijie YangXiaoxiang ShiChunan ShiZhuoming ChenDaiyaan ArfeenReyna AbhyankarZhihao JiaPublished in: ASPLOS (3) (2024)
Keyphrases
- language model
- language modeling
- probabilistic model
- n gram
- speech recognition
- retrieval model
- information retrieval
- query expansion
- language modelling
- document retrieval
- ad hoc information retrieval
- test collection
- mixture model
- smoothing methods
- bayesian inference
- language models for information retrieval
- bayesian networks
- statistical language models
- pseudo relevance feedback
- query terms
- translation model
- language model for information retrieval
- document ranking
- word error rate
- context sensitive
- word clouds
- query specific
- relevance model
- document length
- retrieval effectiveness