MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers.
Muhammad Bilal ShaikhDouglas ChaiSyed Mohammed Shamsul IslamNaveed AkhtarPublished in: EUVIP (2023)
Keyphrases
- visual data
- multimedia
- image data
- video files
- image content
- single image
- image features
- input image
- audio files
- key frames
- image representation
- image regions
- image frames
- image classification
- audio visual
- image segmentation
- signal processing
- digital video
- human actions
- video data
- image retrieval
- multiscale
- audio video
- edge detection
- video retrieval
- video streams
- segmentation method
- action recognition
- digital audio
- multimodal information
- computer vision
- audio features
- video signals
- cross modal
- video analysis
- image collections
- video content
- multimedia data
- visual information