PolyViT: Co-training Vision Transformers on Images, Videos and Audio.

Published in: CoRR (2021)

Keyphrases