Spoken Moments: Learning Joint Audio-Visual Representations From Video Descriptions.

Published in: CVPR (2021)

Keyphrases