Achieving Real-Time Execution of Transformer-based Large-scale Models on Mobile with Compiler-aware Neural Architecture Optimization.

Published in: CoRR (2020)

Keyphrases