MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions.

Published in: CVPR (2022)

Keyphrases