sLLM: Accelerating LLM Inference using Semantic Load Balancing with Shared Memory Data Structures.
Jieyu LinSai Qian ZhangAlberto Leon-GarciaPublished in: ISQED (2024)
Keyphrases
- load balancing
- shared memory
- data structure
- low overhead
- message passing
- parallel algorithm
- distributed systems
- dynamic load balancing
- fault tolerance
- peer to peer
- distributed memory
- parallel computing
- mobile agents
- parallel machines
- fault tolerant
- parallel computers
- grid computing
- load balancing strategy
- data replication
- pairwise
- bayesian networks
- shared memory multiprocessor
- real time
- belief propagation
- data management
- dynamic programming