Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet.

Published in: ICCV (2021)

Keyphrases