Login / Signup
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices.
Ruslan Svirschevski
Avner May
Zhuoming Chen
Beidi Chen
Zhihao Jia
Max Ryabinin
Published in:
CoRR (2024)
Keyphrases
</>
massively parallel
fine grained
parallel computing
mobile devices
high performance computing
bayesian networks
parallel machines
processing elements
parallel architectures
mesh connected
objective function
embedded systems
lower bound
message passing interface