MOMA: Mixture-of-Modality-Adaptations for Transferring Knowledge from Image Models Towards Efficient Audio-Visual Action Recognition.
Kai WangDimitrios HatzinakosPublished in: ICASSP (2024)
Keyphrases
- three dimensional
- action recognition
- audio visual
- multi modal
- image data
- image classification
- visual data
- image content
- input image
- human actions
- image features
- human detection
- bag of words
- image retrieval
- activity recognition
- computer vision
- probabilistic model
- image representation
- prior knowledge
- image database
- low level
- visual information
- body parts
- feature points
- image regions
- transfer learning
- image collections
- object recognition
- feature extraction
- multimedia
- information retrieval