Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification.

Wentao Zhu
Published in: CoRR (2024)