Sign in

Attention-Based Cross-Modal Fusion for Audio-Visual Voice Activity Detection in Musical Video Streams.

Yuanbo HouZhesong YuXia LiangXingjian DuBilei ZhuZejun MaDick Botteldooren
Published in: Interspeech (2021)
Keyphrases