Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs.
Yixuan MeiYonghao ZhuangXupeng MiaoJuncheng YangZhihao JiaRashmi VinayakPublished in: CoRR (2024)
Keyphrases
- language model
- max flow
- language modeling
- n gram
- probabilistic model
- document retrieval
- retrieval model
- query expansion
- information retrieval
- language models for information retrieval
- smoothing methods
- relevance model
- vector space model
- energy minimization
- test collection
- distributed systems
- graph cuts
- graphical models
- multistage
- linear programming
- quadratic programming
- text mining
- convex programming
- least squares
- feature selection
- machine learning