Self-Supervised Learning of Audio Representations From Audio-Visual Data Using Spatial Alignment.
Shanshan WangArchontis PolitisAnnamaria MesarosTuomas VirtanenPublished in: IEEE J. Sel. Top. Signal Process. (2022)
Keyphrases
- visual data
- visual information
- audio visual
- high dimensional
- video sequences
- video data
- visual features
- image features
- visual content
- contextual information
- spatio temporal
- machine learning
- data sets
- image retrieval
- object recognition
- pattern recognition
- spatial data
- image sequences
- spatial databases
- human motion
- multimedia
- computer vision