Audio-Visual Event Localization by Learning Spatial and Semantic Co-Attention.

Published in: IEEE Trans. Multim. (2023)

Keyphrases