Towards Pareto Optimal Throughput in Small Language Model Serving.
Pol G. RecasensYue ZhuChen WangEun Kyung LeeOlivier TardieuAlaa YoussefJordi TorresJosep Lluis BerralPublished in: EuroMLSys@EuroSys (2024)
Keyphrases
- language model
- pareto optimal
- language modeling
- multi objective
- n gram
- document retrieval
- probabilistic model
- speech recognition
- multi objective optimization
- query expansion
- information retrieval
- retrieval model
- multiple objectives
- language modelling
- ad hoc information retrieval
- mixture model
- relevance model
- smoothing methods
- pseudo relevance feedback
- nsga ii
- language model for information retrieval
- nash equilibrium
- context sensitive
- query terms
- test collection
- translation model
- query specific
- evolutionary algorithm
- statistical language models
- optimal solution
- language models for information retrieval
- neural network
- cross lingual
- metaheuristic
- information retrieval systems
- cooperative