A study on joint modeling and data augmentation of multi-modalities for audio-visual scene classification.

Published in: CoRR (2022)

Keyphrases