Masked co-attention model for audio-visual event localization.

Published in: Appl. Intell. (2024)

Keyphrases