Login / Signup
MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving.
Leyang Xue
Yao Fu
Zhan Lu
Luo Mai
Mahesh K. Marina
Published in:
CoRR (2024)
Keyphrases
</>
website
information processing
computationally intensive
real time
data sets
artificial intelligence
information systems
bayesian networks
information technology
xml documents
data model
hidden markov models
lightweight
computationally efficient
human experts