Self-supervised Learning of Audio Representations from Audio-Visual Data using Spatial Alignment.

Published in: CoRR (2022)

Keyphrases