Login / Signup
MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers.
Muhammad Bilal Shaikh
Douglas Chai
Syed Mohammed Shamsul Islam
Naveed Akhtar
Published in:
EUVIP (2023)
Keyphrases
</>
visual data
multimedia
image data
video files
image content
single image
image features
input image
audio files
key frames
image representation
image regions
image frames
image classification
audio visual
image segmentation
signal processing
digital video
human actions
video data
image retrieval
multiscale
audio video
edge detection
video retrieval
video streams
segmentation method
action recognition
digital audio
multimodal information
computer vision
audio features
video signals
cross modal
video analysis
image collections
video content
multimedia data
visual information