Achieving Real-Time Execution of Transformer-based Large-scale Models on Mobile with Compiler-aware Neural Architecture Optimization.
Wei NiuZhenglun KongGeng YuanWeiwen JiangJiexiong GuanCaiwen DingPu ZhaoSijia LiuBin RenYanzhi WangPublished in: CoRR (2020)