Fusion of Audio and Visual Embeddings for Sound Event Localization and Detection.
Davide BerghiPeipei WuJinzheng ZhaoWenwu WangPhilip J. B. JacksonPublished in: ICASSP (2024)
Keyphrases
- soccer video
- visual information
- thermal images
- event detection
- detection algorithm
- cross modal
- visual data
- reliable detection
- detection method
- data fusion
- activity detection
- audio signal
- visual concepts
- low dimensional
- multimedia
- video scene
- multi modal fusion
- localization algorithm
- visual analysis
- fusion method
- multi sensor
- manifold learning
- detection rate
- false positives
- high dimensional data
- video data
- low level
- audio visual
- event recognition
- multi modal
- dimensionality reduction
- accurate localization
- multimodal fusion
- video sequences