Login / Signup
Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification.
Wentao Zhu
Published in:
CoRR (2024)
Keyphrases
</>
video classification
multimedia
audio visual
visual information
visual data
search engine
computer vision
high level
feature extraction
video shots