Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions.

Published in: CoRR (2021)

Keyphrases