Login / Signup

DeFT: Flash Tree-attention with IO-Awareness for Efficient Tree-search-based LLM Inference.

Jinwei YaoKaiqi ChenKexun ZhangJiaxuan YouBinhang YuanZeke WangTao Lin
Published in: CoRR (2024)
Keyphrases