A study on joint modeling and data augmentation of multi-modalities for audio-visual scene classification.
Qing WangJun DuSiyuan ZhengYunqing LiYajian WangYuzhong WuHu HuChao-Han Huck YangSabato Marco SiniscalchiYannan WangChin-Hui LeePublished in: CoRR (2022)