Sign in

Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference.

Junyan LiLi Lyna ZhangJiahang XuYujing WangShaoguang YanYunqing XiaYuqing YangTing CaoHao SunWeiwei DengQi ZhangMao Yang
Published in: CoRR (2023)
Keyphrases
  • search space
  • tree construction
  • data sets
  • cost effective
  • neural network
  • image sequences
  • pairwise
  • computationally efficient
  • power system
  • learning to rank
  • pruning strategy