sLLM: Accelerating LLM Inference using Semantic Load Balancing with Shared Memory Data Structures.

Jieyu Lin Sai Qian Zhang Alberto Leon-Garcia

Published in: ISQED (2024)

Keyphrases

load balancing
shared memory
data structure
low overhead
message passing
parallel algorithm
distributed systems
dynamic load balancing
fault tolerance
peer to peer
distributed memory
parallel computing
mobile agents
parallel machines
fault tolerant
parallel computers
grid computing
load balancing strategy
data replication
pairwise
bayesian networks
shared memory multiprocessor
real time
belief propagation
data management
dynamic programming