Audio-visual scene classification via contrastive event-object alignment and semantic-based fusion.
Yuanbo HouBo KangDick BotteldoorenPublished in: MMSP (2022)
Keyphrases
- audio visual
- scene classification
- multi modal
- object recognition
- image classification
- visual information
- natural scenes
- d objects
- multimedia
- visual words
- biologically inspired
- image representation
- bag of features
- visual data
- multiscale
- spatial relationships
- spatial relations
- image content
- keypoints
- human computer interaction
- visual features
- training set