Login / Signup

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads.

Tianle CaiYuhong LiZhengyang GengHongwu PengJason D. LeeDeming ChenTri Dao
Published in: CoRR (2024)
Keyphrases
  • neural network
  • main contribution
  • theoretical framework
  • data sets
  • information retrieval
  • expert systems
  • lightweight