Focus and Align: Learning Tube Tokens for Video-Language Pre-Training.

Published in: IEEE Trans. Multim. (2023)

Keyphrases