ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through Regularized Self-Attention.

Published in: CoRR (2022)

Keyphrases