InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.

Published in: CoRR (2023)

Keyphrases