Login / Signup
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads.
Tianle Cai
Yuhong Li
Zhengyang Geng
Hongwu Peng
Jason D. Lee
Deming Chen
Tri Dao
Published in:
CoRR (2024)
Keyphrases
</>
neural network
main contribution
theoretical framework
data sets
information retrieval
expert systems
lightweight