SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning.

Published in: CVPR (2022)

Keyphrases