Audio-visual scene classification via contrastive event-object alignment and semantic-based fusion.
Yuanbo HouBo KangDick BotteldoorenPublished in: CoRR (2022)
Keyphrases
- audio visual
- scene classification
- multi modal
- object recognition
- visual information
- biologically inspired
- natural scenes
- visual words
- d objects
- image classification
- visual data
- multimedia
- image representation
- bag of features
- bag of words
- keypoints
- humanoid robot
- image database
- image retrieval
- multiscale
- information retrieval