Login / Signup

Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification.

Wentao Zhu
Published in: CoRR (2024)
Keyphrases
  • video classification
  • multimedia
  • audio visual
  • visual information
  • visual data
  • search engine
  • computer vision
  • high level
  • feature extraction
  • video shots